Saturday, May 22, 2010

Extracting images from Word documents

Occasionally, we might need to extract images from the documents. Traditionally, it involves copying the images from the document and pasting it in an image editor was the way. With the ODF (Open Document Format), life gets much easier.

In Word 2007 and above, the document is stored in ODF format which is nothing but essentially an zip archive of all the needed information. It contains the images, text and formatting neatly tucked as a document.

To extract the images, just rename the .docx to a .zip file and extract the contents. One can find all the needed images, text and formatting XML as individual files under various folders.

From there one could use the embedded images.

No comments:

Post a Comment