Ok, so you have your novel. It is made up of some text and some graphics, either vector, raster or both. Now your novel needs to be stored in file on your computer. Files can be divided into two main types: text and binary. A text file is, unsurprisingly, a file that contains text. By text I mean that the low level 0s and 1s that are stored on the computer’s disk can be mapped to text characters stored in a character set in a meaningful way. To be clear, a text file isn’t one which contains text when you open it in a word processor (though it might). It is one which contains text if you open it in a text editor. A binary file might also contain text, but it needs another level of processing to take the binary information and turn it into character data.
Why is this important? Because, in order to create our converter plugin for LibreOffice, we are going to want to have a good poke about inside the internals of our novel’s file so we can see how the format works, and if we can’t see those internals because the file type is binary, we are going to have problems. We are also going to want to be poking about inside our converted file to make sure everything is indeed working as intended. We also want to be able to use stand system tools to create and check it.
Now, with all these advantages, you might wonder why anyone would use a binary format. However, it’ll come as no surprise to most of you to learn that an example of a binary file would be one using Microsoft Office’s older .doc format. You can see this for yourself by saving a copy of a file in LibreOffice using the .doc format and trying to open it up in a text editor (such as gedit).
Binary files are preferred my Microsoft for almost exactly the same reasons that everyone else prefers text files, because it makes it harder for people to write programs that are compatible with that format, meaning you have to purchase specialised programs from them to accomplish simple tasks rather than using the tools that are provided by your operating system, a third party or yourself. You will find that lots of file formats used by older big software companies are binary for this reason.
The only advantages binary files have over text files (for the user) are that they are (usually) smaller, (possibly) faster and that they make it easier to combine different files into one (e.g your novel and your cover). Though there are various way this can be done in a text file, they often result in something that is not readable by standard tools.
A fairly simple way for text files to get most of the advantages of binary files is to split the information into various different files and store them in a directory compressed using the zip algorithm. This gives a small binary file that is so easy to turn back into a series of text files that it can, for all intents and purposes, be considered a text file. This is the technique used by LibreOffice for it’s .odt file format. If you like, you can make a copy of one of your files, decompress it and have a poke about inside, as, in a minute, we are going to be looking at some of these files (don’t worry if you don’t know how to do this, as I’ll be showing you later).