Getting a Test File

Note, quite a lot of time has passed between me writing this and having the time to polish it and get in online. During this period, Amazon has improved the file format used in kindles (they currently use KF8). This means that a lot of the limitations I mention no longer exist (at least for newer devices). They have also improved the tools used to create mobi files. One big difference is that you can now convert from epub. This has quite a few advantages, and future versions of this plugin will take this route, but for now, this method will still work.

Right, things are about to get a bit hairy. What we are going to do is write an export plugin for LibreOffice that will convert our novel to xhtml suitable for submission to the Amazon kindle store. The first thing we need to know is how the novel is currently laid out by LibreOffice. A simple way to do this is just to have a look at a .odt file in a text editor (such as gedit). You can either use my test document below or use your actual novel. In order to extract your novel’s text (using Ubuntu Linux) simply right click on it and open with Archive Manager. Select the content.xml file and extract it somewhere.

Unzip showing the contents of an .odt file
The contents of an .odt file. The text is stored in content.xml.

Before you open it in your text editor, we need make it a bit more human friendly. Currently all the text is in one very long line, which is bit difficult to read and will probably exceed the maximum line length the text editor can deal with. To neaten it up, just pass it through tidy (you can use the online version if you can’t install it for some reason: tidy). This will add line breaks and indentations that will make the file much more readable.


tidy -xml -i -o testBook.xml content.xml
 

You should now have a file called testBook.xml, open it up (or open my test document below) in gedit (or the text editor of your choice).

A document for testingLibreOffice Writer Icon

You will see the file starts with the usual xml declarations and then has a load of namespace declarations followed by lots of style book keeping. I have added the stylesheet declaration for easy testing.

The xml and namespace declarations in test.xml
The xml and namespace declarations in test.xml

We’ll come back to these in a minute. The actual guts of your book starts inside the <office:text> element, which is inside the <office:body> one.

The contents of content.xml
The text of our novel is in <office:text>.

One of the things you might be a bit surprised by is that quite a lot of your lovely styles seem to have gone missing to be replaced with styles with names like “P1”. Don’t despair, this is normal. These weird styles are automatic styles. By default, LibreOffice will use automatic styles to make it easier for it to compare documents. You can turn them off in LibreOffice 5 with the catchily titled “Random number to improve accuracy of document comparison: Store it when changing the document” setting. Be aware though that this won’t delete any that already exist in a document. If you are planning on editing the plugin a lot, you will find it simpler if you create a document that doesn’t use automatic styles, otherwise don’t worry too much about it, they just make the plugin a bit harder to read.

The reason we are looking at these files is because, when LibreOffice exports a file, it first converts it to almost this exact format in memory before applying the export transformation, so these files are effectively what our plugin needs to transform. The only difference is that <office:document-content> is replaced by <office:document>. I don’t know why. So if you are using your own novel’s text to test with, do a quick search and replace on “office:document-content” replacing it simply with “office:document” (make sure to replace both opening and closing tags) and, whilst you’re at, add in a stylesheet declaration (the stylesheet declaration goes right after the xml declaration), then we can conveniently test our plugin as we write it by opening our test file in a browser.


<?xml-stylesheet type="text/xsl" href="odtToXhtml.xsl"?>

This assumes that the stylesheet we are about to write is going to be called “odtToXhtml.xsl” and is in the same directory as the test file. As of this writing, this works fine with Firefox on Linux but not Chrome. Other browsers/platforms are untested. Testing the transformation as we go in a browser ensures that, once the plugin is finished, we can then install it into LibreOffice confident that it is going to work as expected.

Series Navigation<< Let’s Get PastingInside your Test Document and Starting Your Stylesheet >>

Leave a Reply

Your email address will not be published. Required fields are marked *