[WARNING] Don't use the Microsoft "Print to PDF" printer

HankArnold

Active Member
Reaction score
130
Location
POUGHKEEPSIE, NY
I just found out that the built-in Microsoft "Print to PDF " generates *HUGE*PDF files. I printed a small (18K) Word document using it to generate the PDF file. The result was a 484KB file! A CutePDF printer created a 36KB file!!! Bottom line, don't use the Microsoft program.
 
Although I agree with your advice, even 484K can't be considered huge by any modern standard.

Microsoft's PDF conversions have always been larger than any other utility I've used over the years. I currently use PDFCreator and it, like CutePDF, creates far more compact files.
 
Hyperbole aside...

This depends on your use case.

Microsoft Print to PDF outputs a PDF v1.7 file, configured for print use (600dpi) at full color. And you can't change it. Worse, this process converts the print spool into a PDF, which results in a raster rendering of a document stored as a PDF. This is a GRAPHIC not text.

Modern version of Word (read 2016 and newer) have the built in ability to save as PDF. This also results in a PDF v1.7 file, but is a text document properly converted to PDF complete with bookmarks and everything else. This process is comparable to, but not as flexible as what Adobe Acrobat does via it's Word Plugin.

Finally, CutePDF is outputting a PDF v 1.4 file, without any frills. It looks like they're running the older version of ghostscript through a minification process. This has a massive negative impact on print quality in some circumstances. It also is based on a print and is therefore a raster rendering of a document and not the document itself.

My test document stats:
Pages: 2
Words: 440
Characters with spaces 2,600
Paragraphs 33
Lines 44

This document has basically 1 paragraph on page 2, and is 100% text.

Word document itself is 23KB
CutePDF Print is 41KB
Word save direct to PDF is 130KB
Microsoft Print to PDF is 549KB

So I wouldn't say never, but I will say if you're stuck in the old busted process of printing to PDF, yeah... CutePDF seems more efficient on the tin.
 
@HankArnold I forgot to thank you for bringing this up.

I'm tracking a long standing and very strange condition with Point Central. I've got users uploading PDF files and randomly, it complains they're too big. Now size of the files is actually irrelevant BUT I think something that Microsoft Print to PDF does when it makes its files seems to be in play.

Since informing them to use Acrobat's printer instead configured for 150 dpi prints, the number of complaints has fallen off drastically.

And while I'm on that subject, be aware that Acrobat 2020's PDF Printer by default makes huge files too! But it can be adjusted in the print preferences, Microsoft Print to PDF has zero configuration settings.
 
Just to tag on a tangentially-related item - I have a broker client whose broker-dealer has suddenly gotten picky about receiving only PDF/A files (fonts embedded). This is possible with some methodologies and not possible with others - something else to watch out for.
 
This is a GRAPHIC not text.
No the text can be selected, copied and pasted into Notepad, so it isn't graphical.

I use Microsoft Print to PDF often. Single-page invoices from my POS software* end up as PDFs under 200KB, size isn't an issue for me. It's more reliable than the software's own export-to-PDF feature.

* Using MYOB RetailManager 12.5
 
No the text can be selected, copied and pasted into Notepad, so it isn't graphical.

I use Microsoft Print to PDF often. Single-page invoices from my POS software* end up as PDFs under 200KB, size isn't an issue for me. It's more reliable than the software's own export-to-PDF feature.

* Using MYOB RetailManager 12.5
Do not mistake Acrobat Reader and other PDF software's ability to instantly OCR a graphical PDF into text, for the document being actual text.

Documents made via a printing PDF process are nowhere near as searchable, or properly indexed. They serve one purpose... to be printable.

But, as you've pointed out that's good enough for a TON of applications. It's not for me in only one circumstance so far.
 
The result was a 484KB file! A CutePDF printer created a 36KB file!!!
What year is it again? 1993? I mean, I guess if you're indexing millions of PDF documents this could be a problem but documents haven't been a type of file you've had to worry about taking up too much space since the early 90's.
 
I've got a Fujitsu desktop scanner and have recommended them for a long time. They're fast and create efficient sized PDFs. Got excited a few years back when good phone scanner apps came out and was ready to replace. But, the phone app would create a 250Kb file and the same page scanned on the Fujitsu would be about 32Kb.
 
What year is it again? 1993? I mean, I guess if you're indexing millions of PDF documents this could be a problem but documents haven't been a type of file you've had to worry about taking up too much space since the early 90's.
It adds up quick at a mortgage office!
 
What year is it again? 1993? I mean, I guess if you're indexing millions of PDF documents this could be a problem but documents haven't been a type of file you've had to worry about taking up too much space since the early 90's.
This is just the approach that leads to bloated operating systems and applications. Just because resources are cheap/plentiful/available, doesn't mean that using them to the max should be a mission target. None of this bloat comes at zero cost.
 
It adds up quick at a mortgage office!
Seriously? Let's do a little math. Even a 1MB apiece, a 1TB drive would hold over a million documents. Let's say a mortgage company requires 10 documents per customer. That means they'd need 100,000 customers to fill ONE 1TB hard drive that can be purchased for $40. And there's no way they have 100,000 customers unless they're a huge company with dozens/hundreds of computers.

This is just the approach that leads to bloated operating systems and applications.
Apples and oranges. What bogs down the OS and programs isn't the amount of space they take up, but how much processing power/RAM they require to run. Do you really care that Office 2019 takes up 20x more space than Office 97? It still only takes up less than 2% of even a small SSD. Storage capacity has been plentiful for a very long time. Now if we were talking a new type of encoding that would make 4K video take up 1/2 the space then that's definitely worth looking into, but other than that the size of document files really doesn't matter for 99.99999999% of people.
 
@sapphirescales Your math is based on faulty premise.

Inefficiencies amplify with scale. The data I posted above was from a simple all text just a bit more than 1 page document. A 5x storage increase in that file is negligible, from 100k to 500k. BUT... I'm not working with that simple of a document am I?

Oh no... I'm dealing with in this specific case, full mortgage packages. Which takes a file from 10mb to 150mb. This results in 100gb of document creep PER YEAR. When the platform is holding almost 20 years of loans, it adds up.

I mean you're still not wrong, I've got a server with 1.5tb of SSD storage, that's a bit more than 5 years old. The new platform simply needs 3 TB of storage and it's good for the lifespan of the replacement server. We plan for these things.

But that doesn't make the usage efficient, which is why I'm working with the client to determine a proper data lifecycle. Which means after x years, I'm going to run this script on the folder that passes each PDF through ghostscript to lower quality settings to 150dpi, which in general cuts all the sizes to a quarter or less of what they are currently.

If they use proper methodology generating their PDFs that compression process is rendered useless, because they could indeed store many millions more of documents. But it seems to me that you're having trouble imagining a business at scale that can actually generate data like this. Which is pretty silly considering I'm watching it happen at an office that owns 30 desktops. They aren't exactly huge.

Also, cloud migration means paying per GB of storage, that's a direct monthly cost that needs containment. Furthermore every GB of data you have to restore during an outage is that much more time you're down costing even more money.
 
full mortgage packages. Which takes a file from 10mb to 150mb.
Jesus, I don't have any mortgage clients myself but when I did my personal mortgage the documents were less than a meg each. What are they doing, storing high resolution photos in their PDFs? Even my entire mortgage agreement which is like 300 pages long is only like 5MB.

30 desktops
1,000,000 files divided by 30 desktops is over 30,000 documents each. Even if each client required 10 documents, that's more than 3,000 clients per year per agent! Divide that by 260 days/year and you're looking at almost 13 people PER DAY. That's just not realistic. What mortgage agent closes on 13 properties per day??? 1,000,000 files is more like 10 years of docs for 30 agents. Unless those documents really are 100MB+ apiece, I don't see there being a problem. But honestly, I don't know how a PDF document could get so big unless it was storing photos or is so ridiculously long that it shouldn't be in one PDF.
 
@sapphirescales The mortgage package isn't just the documents you see, it's everything else that went with them.

So yes... high resolution pictures of the property are part of the mix. But it's also a horde of tax documents, various versions of some bits as they get worked on. It's utterly nuts.

And you're right, when you use proper processes the final file is 50mb, but if someone anywhere along that path uses Microsoft Print to PDF... the entire file after that gets goofed up and balloons. I've seen them get as large as 500mb!

It's nuts when you look at two loans and the PDFs are essentially identical but one is 5 times larger. But it does happen.

Also your estimate of closure rate is conservative. But I'm not really at liberty to discuss how. I will simply say that they close a bit more than that most days.
 
Do not mistake Acrobat Reader and other PDF software's ability to instantly OCR a graphical PDF into text, for the document being actual text.
Huh?
I just open the PDF created by the Microsoft print driver, in my basic PDF reader Sumatra, and I can highlight and copy text immediately. That isn't some real-time OCR feature!
 
Huh?
I just open the PDF created by the Microsoft print driver, in my basic PDF reader Sumatra, and I can highlight and copy text immediately. That isn't some real-time OCR feature!
It is! But that doesn't mean the file can be readily indexed via nonspecialized readers.

The PDF libraries out there are rather good these days, and make so much of that seamless. I think even Chrome has the ability to do it, which means New Edge too.

But that's computationally heavy to do when you have a folder with a few hundred of them, and you're trying to run a content search. Which isn't something most people do anyway, so it doesn't matter much.
 
Back
Top