Tuesday, November 1, 2011

How to download multiple files as one zip with the Google Docs Java API

We have been working on a text mining project where data comes from documents stored in Google Docs. These documents are added and updated all the time. Since downloading each file separately is very slow, we wanted to download all new and modified files as one zip file, an option that is supported by the Google Docs API.
If you, like us, have been trying to download multiple files as one zip file using the Google Docs API, and your code does not work even if it should, here is why: gdata-docs-3.0.jar has a bug (release 1.46) and it had for a while. The good news is that is easy to work around.
The bug is in DocsService.declareExtensions(). All Atom extension classes register themselves there so the XML parser knows about the new node types. A call ArchiveEntry.declareExtensions() is missing, therefore it does not register as itself as an Atom extension, and cannot be parsed. The symptom is an error like this:

com.google.gdata.util.XmlParser$ElementHandler getChildHandler
No child handler for archiveConversion. Treating as arbitrary foreign XML

The fix is to call:
gdocService = new DocsService("");
new ArchiveEntry().declareExtensions(gdocService.getExtensionProfile());

before the first time you request the creation of the zip file from the Google Docs API, and Google Docs responds with an ArchiveEntry.
Therefore the steps to create a zip file in Java are:
// Log into Google Docs
DocsService gdocService = new DocsService("");
System.out.println("Logging in with google docs...");

// this is the url to request the creation of a zip file
URL zipExportUrl = new URL("https://docs.google.com/feeds/default/private/archive");

/* register ArchiveEntry as an extension to the Atom format. Without this
* fix, DocsService.insert() succeed, but the parsing of the returned XML value
* will fail.*/
new ArchiveEntry().declareExtensions(gdocService.getExtensionProfile());

/* Create an ArchiveEntry describing what you want to download,
* and the format conversions you request: */
ArchiveEntry archiveEntry = new ArchiveEntry();
// option to export all google docs as plain text
ArchiveConversion gdoc2txtArchiveConversion = new ArchiveConversion("application/vnd.google-apps.document", DocumentListEntry.MediaType.TXT.getMimeType());
// files that will be included in the zip file, identified by their resource id
archiveEntry.addArchiveResourceId(new ArchiveResourceId(entry.getResourceId()));

// request the creation of the zip file
ArchiveEntry zipEntry = gdocService.insert(zipExportUrl, archiveEntry);

ArchiveStatus.Value zipStatus = zipEntry.getArchiveStatus().getValue();

/* spin lock waiting for the zip file to be ready or fail:
* of course you know better than using a spin lock ;-) */
while(zipStatus!=ArchiveStatus.Value.FINISHED && zipStatus!=ArchiveStatus.Value.ABORTED) {
  zipEntry = gdocService.getEntry(new URL(zipEntry.getSelfLink().getHref()), ArchiveEntry.class);   zipStatus = zipEntry.getArchiveStatus().getValue();

if(zipStatus==ArchiveStatus.Value.FINISHED) {
  // url where the zip file is now available
  String resUrl = ((OutOfLineContent)zipEntry.getContent()).getUri();
  File path = new File(downloadDir, "googledocs.zip");
  downloadFile(resUrl, path);