Thursday, May 18, 2006

Zip files for Office 12, DWF

I was reading recently about the new Office 12 XML format (Office Open XML)... And it struck me how close it was to the approach that Autodesk took with DWF...

It's a Zip file, with content and folders, as well as XML to describe the meta-data.

In the case of DWF, that's property data, relationship data, etc.

In the case of DOCX (for MS Word 2007), it's meta data like the core Author/Summary/etc properties, but also information about the sections, styles, etc.

1. I think that this is a great idea of all companies - who publish structured data.
2. I wonder about who thought of this first? I seem to recall SmarTeam's iXF standard had some of this approach as well...
3. How do you get access to this data? There are different approaches...
A. Use a ZLIB library to decompress and find the files, then use Xml libraries to understand them.
B. In the DWF world, there's the DWF Toolkit (free), however I believe it's still C++ only, which puts an obstacle in front of us mere mortals (I could do it, but since C# has come out I've done so little C++ that it's withered on the proverbial vine... plus I'm in management now ;) ).
C. Most interesting, Microsoft is including a new namespace "System.Packaging" in WinFX, which will provide transparent mechanisms for reading "packaged" data such as this... Which raises the question of whether DWF will be readable based on this approach... It doesn't seem likely somehow, because the structures are not exactly the same, and System.Packaging seems very much tied to the structure that MS is using in OfficeXML... But here's hoping...

Links of Interest:
Inside a DWF: http://autodesk.blogs.com/between_the_lines/2005/03/a_look_inside_a.html
The Office XML Format:
http://www.microsoft.com/office/preview/itpro/fileoverview.mspx
Brian Jones, MS Product Manager, on Office XML:
http://blogs.msdn.com/brian_jones/archive/2005/06/06/425750.aspx

2 comments:

张明全 said...

Do you know how to extract files from dwf, it seems that the Packaging namespace is incapable of doing this.

Matt Mason said...

Good question on what the right way is to do it now. Historically, I have used either just the Zip mechanism (through a 3rd party component like ComponentOne) - or used the free DWF Toolkit directly.