$Alt$
首页>新闻中心>内容列表
Handling PDF Documents

For many translators, Acrobat Portable Document Format files are nightmarish; even more, in forums and mailing lists periodically the question reappears again: How can I edit this PDF document?

Short answer: you can’t.

Long answer: it is possible, but only limitedly, and quite possibly your client will not be happy with the results.

The point is that PDF documents were never meant for editing or translation. About ten years ago, Adobe Inc., its creator, was successful in responding to a market need: documents that could be easily exchanged, printed, and viewed but not modified on all kind of computers. That is, the computer equivalent to a printed document. This is exactly what Acrobat documents are.

They must be considered printed” documents, not “editable” documents. The format has been so successful, that there is a steady flow of PDF documents to translate. However, as most computer formats are editable, there is a problem: despite the original intention of its creators, people cannot be convinced that PDFs are uneditable, and so they ask for translations and modifications to these documents.

Many translators already know the answer: you cannot deliver a translation on PDF, at least on the same PDF that you have been delivered. The format is not (extensively) editable.

What you can do is try to extract the text from the PDF and process it using your favorite word processor. The most straightforward and economical procedure is simply pressing the “Select Text” button in Acrobat Reader, and pressing Ctrl-A (Select All), and copy the contents to the clipboard (Ctrl-V). Then you can paste this into your word processor.

Depending on the complexity of the page layout, this may prove minimally satisfactory. Although I haven’t used this process for some time, I have just copied-pasted a PDF to Word and the result is usable, at a pinch: fonts and type sizes are kept, tables disappear, although their content is preserved (in a somewhat mangled form), illustrations are gone. The main problem is that each line ends in a hard carriage return/line feed, which generally has to be replaced by a single space in order to have continuous sentences again. I have developed a small Word macro that searches for carriage returns and replaces them by spaces. This however, has to be done one line at a time, under human supervision, because the system cannot know when the carriage return should be kept (for instance, in headings, lists, and at the end of paragraphs.)

Automatic conversion

In many instances, an automatic conversion program is preferable. I have used both a “pure” PDF to Word converter (Scansoft PDF Converter) and optical character recognition software (Omnipage and Fine Reader). You can find a healthy provision of both types by doing an Internet search on “PDF extraction” or “PDF conversion”.

What is the difference between them?

I have already said that PDFs are like printed documents. In most cases, however, the text is kept as computer characters, that is, you can copy/paste it. In some other cases, all the text (or some of it, in headings, for instance) is just an image, like the characters on a faxed page. “Pure” PDF converters can handle computer characters, but they choke on graphics. If a document contains all or some “graphics-characters” areas, they cannot process them. In this instance, optical character recognition programs come to the rescue. They look at the page as if it were really a printed page, and they try to interpret it and convert it to computer characters. They may also extract the illustrations on the page. It is not necessary to print and scan the Acrobat pages for this. Modern OCR software accepts them directly.

PDFs can also be “password protected”. If you do not have the password you cannot extract text from them. Character converters cannot process these files unless provided with the password. OCR converters can handle them perfectly, as they just “look at the pages”, not using their internal character coding.

Conversion problems

The conversion is not without problems, though. PDFs are very capricious creatures, indeed. They may come in all sorts of formats, from the usual letter-size page layout to complex (and big!) newspaper-like pages. I have often received “unfolded brochures” that contained four or more successive pages, one beside the other, forming one big PDF page, maybe 25 in. long. Often, Word cannot handle these unusual printing formats. Tables, columns and boxes all present problems. The conversion programs can do an admirable job of rendering these printed formats as visually attractive Word documents, but under the surface the converted files are nightmarish: they are full of strange styles, disparate measures, unconventional character and line spacing... just for the sake of mimicking as closely as possible the original document.

Often, if you try to process these documents with Trados or any other computer aided translation packet, the scene is ripe for disaster. It’s too much like a house of cards: nice to watch, but you cannot touch it. The unavoidable modifications that a translation entails will quite probably play havoc with these documents.

There is no hope, then? Not entirely: in difficult cases, I resort to manual extraction. Also, some extraction programs offer a menu of layout options for the converted file: you can select from the full recreation of the original appearance, to plain text extraction. I often find a middle-of-the-road selection satisfactory: I keep some typographical features (such as headers, bold type and italics), and suppress everything else

Regenerating the original PDF document

In most cases, a translator simply cannot (and should not) attempt this. I have seen some well-meaning attempts to deliver a “PDF-like” document, and they were failures. The client (or the agency) has to provide for proper DTP if necessary. And, believe me, the DTP-person will not thank you for any complex formatting in your document. More often than not, the very first thing that a DTP specialist has to do is to eliminate all complex formatting from the translated file, in order to be able to lay out it again.

However, in some special cases, you may provide a PDF document in order to protect your work, or as a sample (see below).

Using PDF to your advantage

Independently of the format of received document, you can use the Acrobat format to your advantage. Few translators own the full Acrobat “maker” program because it is relatively expensive (US$299 the “Standard version” from Adobe) and apparently there isn’t much use for it. Nowadays, there are many third-party PDF generators, which although they may not offer all the features of the Adobe product, are perfectly suitable for most jobs. Once again, you can find many of them by doing an Internet search on “PDF generator”. My personal favorite is PDF 995 at http://www.software995.com/, a very good program created by former Adobe employees.

And what do you want to generate PDFs for? Mainly for invoicing. Invoices in PDF are difficult to tamper with, can be printed as originals and many businesses accept them as equivalent to their printed counterparts. You can password protect them, so that they cannot be modified, and you can even sign them electronically. As with all electronic documents, they are easy to file and retrieve. You can also send budgets and bids in PDF format: Word documents or e-mail messages cannot give the kind of long-term security that a PDF offers.

For the same reason, in some cases you can also send PDF files as backup for Office documents or as an easily viewed complement to original DTP files (such as Quark Xpress or InDesign). Many users do not have the means to see these files, but they can easily check the PDF printed from them.

Source:http://www.proz.com/translation-articles/articles/480/1/Handling-PDF-Documents

(编辑:译风翻译)
(时间:2010-1-29 10:27:49)
  英语翻译  日语翻译
  韩语翻译  德语翻译
  法语翻译  俄语翻译
  西班牙语  阿拉伯语
  意大利语  马来西亚语
  泰语  越南语
  柬埔寨语  葡萄牙语
  老挝语  缅甸语
  丹麦语  芬兰语
  瑞典语  爱沙尼亚语
更多>>  
  广告翻译  证件翻译
  服装翻译  电子翻译
  计算机翻译  建筑翻译
  机械翻译  化工翻译
  法律翻译  金融翻译
  航空翻译  医药翻译
  纺织翻译  石油翻译
  工商管理翻译  农业翻译
  生物翻译  文献翻译
  经济翻译  教育翻译
更多>>  
 重庆翻译公司| 上海翻译公司| 天津翻译公司| 北京翻译公司| 武汉翻译公司| 成都翻译公司| 西安翻译公司| 沈阳翻译公司| 南京翻译公司| 广州翻译公司| 昆明翻译公司
 贵州翻译公司| 广西翻译公司| 西藏翻译公司| 新疆翻译公司| 济南翻译公司| 青岛翻译公司| 青海翻译公司| 银川翻译公司| 长春翻译公司| 太原翻译公司| 长沙翻译公司
 大连翻译公司|
sentrybox.gif 版权所有copyright.gif重庆译风翻译有限公司 技术支持:重庆翻译公司IT部 wa50019901500372.gif

渝ICP备08100405号 电话:023-67716530/18983436021 Email:yifont@qq.com