![]() ![]() It needs too much Effort to extract the correct data without hard coding in calculation stages, even if it is possible. Data are pasted in different structure, not accordingly from top to bottom like in PDF, so If we have document which has large amount of words, tables, etc it is almost impossible to catch (calculate) all needed data. I think that is not enough, there are reasons: Use Surface Automation to read certain regions in PDF We can use just simple copy data with Global Send KeysĢ. ![]() ![]() I would like to ask if there is planned in future to create an Object in BP which will deal with PDF manipulation, or just some update which will enable better manipulation with PDF documents.įor now we have just only two possible options how to read data from PDF:ġ. If you start entering the library world of PDF manipulation, you should start with reading the spec, especially chapter 8 (Graphics) and chapter 9 (Text), and you'll get a better understanding of what you're going to have to do with the library.As I have been working recently on a project where I had to read data from different types of PDF documents. In general, our customers want us to understand the spec instead of them and make the rest easy - but tasks like this (redaction is another one), are really hard to do without understanding the depth of the PDF specification. The hardest part is that we do our very best to hide the complexity of PDF from our customers. My library is being used by Atalasoft, the company I work for, to generate PDFs from whole cloth and to do editing within a very limited domain (annotations, document metadata). If it were me, I would use tools that I've developed and I'd still be a little shy of this task. PdfLib, which is a commercial product, appears to be to generate PDF, although it's not clear if it can consume it, but you could certainly get both sides by gluing them together. I'm not going to recommend a library for you - sorry - I gave xpdf a brief look over and it's not clear whether or not it has PDF generation capabilities or if it is simply a consumer of PDF. This is not editing text - it's just trying to find a single word or phrase. This is why, when I wrote the find text tool for Acrobat 1.0, it took me two months of sweat to handle as many of the edge cases. And what if your text is laid our on a curve or an unusual orientation (maps, ads)? What about the cases where someone subtly changes the font size for a greater distinction between upper and lower case or simulates small caps? If you're not lucky (which is most of the time), they're instead lay out the text with a set of moves before every single glyph on the page. Some programs want to lay text down very precisely, so if you're lucky, they'll use the TJ operator which lays out text with specific kerning. There are PDF generation programs (I'm looking at you, troff) that lay all the plain text on a page first, then lay all the italic text, then all the bold text. Let me briefly describe why this is as bad as it sounds.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |