Wednesday, December 16, 2015

The Udaan publishing story

Udaan is the publication of the STC India annual conference 2015. It was released at the conference that happened between Dec 10 through 12 at Pune. This blog post talks about the backend publishing process.

Udaan, the name, was conceived by the conference program manager, Mugdha Kulkarni. When she handed the Udaan brief to us, Mugdha was very clear:
  • It should be something that people will keep on their desks for at least a month or two. Not something that goes to the raddiwala on Day 2 of the conference.
  • If we're having an online version, it should really be a complete digital experience. Not boring web pages.
The first item on that list could be addressed by getting good articles. We spread the word around, and when the articles started coming in, and we began to see all the non-boring possibilities in them, it became clear that *just* HTML won't do. We wanted interaction and, consequently, javascript and styling.

Which became something of a problem, initially. The agency that did the conference web presence put up their hands. What we were asking for needed "programming knowledge", they said, and beyond the scope of free services.  Like someone from Allahabad would say, "Itne mein to, bhaiyya, bas itna hi ho payega."

Which meant, hosting Udaan on stc-india.org, where resided the other pages relating to the conference, would be in direct contravention of the second item in our brief: non-boring web pages. A different hosting site was needed. The next choice seemed easy: a blog. Either Wordpress or Blogger. But wait! We need javascript. Neither free Wordpress nor Blogger are JS-friendly. So, no, we need some other solution. Enter GitHub.

O dear GitHub, how do I love thee? Let me count the ways:
  • It is free.
  • It gives you a repository. You store all your content there. And content means *anything*. Any image, any text, any code, any sound, any movie...
  • It gives you on-click publishing (and a public URL) for your repository content.
  • It gives you stuff like version control and multi-author checkins and checkouts.
  • It has its own bugtracker (which we used extensively during the review period).
  • It takes content out of your laptop (which can crash any day, or whose owner might break an arm) and puts it on its server (which can also crash but at least the hospital scenario is unlikely).
Perfect. So, GitHub was to be the platform.

Now to the content.

Everything is in HTML. That means, in the beginning, the only tool I needed was Programmer's Notepad 2. Even plain Notepad would've been sufficient, but line numbers and word wrap are handy! The authors sent in their articles in MS Word and I did a paragraph-by-paragraph copy-paste to HTML. Why didn't I just do a Save As in MS Word and generate an HTML file? Because, when an MS Word file is converted to HTML, a lot of unnecessary styling classes get added and, consequently, the end result is not "clean" HTML.

For the apparent "programming knowledge", I just googled. I had no intention of reinventing the wheel. You'll see that Udaan uses Javascript snippets that are freely available on the net. Udaan also uses styling elements from free CSS available on the net. The main look-and-feel is from the w3schools CSS; I chose it because not only is it the place where many of us learnt our HTML from, but also because their CSS uses responsive design principles.  Udaan also uses icons and images available freely on the net. In short, Udaan uses free stuff, thanks to helpful people who think about giving back to the community and therefore share their work so freely.

But let me talk about content referencing. It's something that's easy in DITA but not so easy in HTML. For consistency, I wanted to use boilerplate text for the headers and footers, and I needed a referencing mechanism that does not need a Ctrl-C + Ctrl-V. The solution that I used is this:
  1. Make HTML files with the boilerplate text. For Udaan, the header file contains the Michi masthead, the clickable Udaan heading, and the ToC (accessed through the hamburger menu). The footer file contains the conference date banner, sponsor logos, and code for the scroll-to-top button.
  2. Place the boilerplate files in the same folder that contains the other HTML files that will "call" the boilerplate text. This is important. The referencing mechanism does not work beyond one folder level.
  3. In the file that calls the boilerplate text, insert a <script> tag like this:
  4. At the place where the boilerplate text is needed in the calling file, create a <div> tag, like this:
Notice that the "id" attribute of the <div> element is the same as the # name that I gave in my <script> tag.
Notice also that the <div> tag has no content and looks like an empty tag. But, when the file is rendered in a browser, the actual contents of the file linked to that <div> element will be shown in this <div>.

This content referencing works perfectly off a server (such as a GitHub content repo) but not in a local file system, so if you're trying this out on your laptop, 8 times out of 10, this won't work. You'll need to test it on a server. I don't know why it works 2 times out of 10.

Let me also talk about folder structure. To maintain our sanity, instead of having just one folder that contains all of the files, I created several folders: each folder holds only one kind of content. I thought it would be more intuitive for a multi-author environment. I also assumed that Udaan would continue to live, and that one day someone else might be managing these files, so I created a README file that explains the folder structure, and how the files inside these folders link to each other (for example, what is file X in folder Y doing here).


Which brings me to premissions. Only the editors (Nibu Thomas and I) have write permission to the GitHub project. Because I own the project, I can add others to it, and adding such people will automatically give them write permission. Everyone else has read permission, which is the default GitHub permission for the whole world. If Udaan is managed by someone else in future, I can "transfer" the project (and, thus, ownership) to that person. The files still remain where they are, the Udaan articles still continue to be read by the world.

People with read permission cannot directly modify the files but they can still file bugs. So, we asked all authors to review their own articles and also do peer-review, and open bug reports for all their suggested changes. By doing this, we broke free of email-jail. Everything pertaining to Udaan was captured right there, in the Udaan repo bugtracker, open for the whole world to see.


I used GitHub's internal publishing method, so it's really a one-click thing. The moment you push a change into the publish-branch of the repo, that very moment the public URL is refreshed with the changed content. So, what's a publish branch? Well, in a GitHub repo, you can have several branches (which correspond to "streams" or "forks" in other repo parlance) but only the stuff that you put in a branch called "gh-pages" is the stuff that GitHub will publish to that external URL it gave you. It's somewhat like the DITA authoring scene: you can have a thousand files in a hundred folders but only those files that you put in a .ditamap file are the ones that get published.

Which brings me to the published web pages. For starters, I wanted the Udaan web page to "look" different from other open browser tags, so I used a favicon.


You see tiny Michi on the Udaan tab? That's what we call a favicon. I couldn't get the favicon to show up in Chrome or IE. I scoured the net for a solution and tried every suggestion but nothing worked. If you know of a way, let me know. Meanwhile, here are the steps to get a favicon on a Firefox tab:
  1. Choose an image file that's no larger than 16 x 16 pixels.
  2. Convert it to an .ico file. There are several google-able online free services that'll do this in a trice.
  3. Save the favicon in the root folder. Ideally, name the file favicon.ico.
  4. In all of the files, in the <head> section, add a link to the favicon file, like this:

After the browser tab, the actual content. I thought we should apply all the good-technical-writing stuff we know, so here's what Udaan has:
  • The first paragraph, just after the title and author name, is either a summary of the entire article or a teaser to entice someone to read further. This is not only good SEO but akin to the <shortdesc> tag of DITA that everyone keeps telling me is good writing.
  • This first paragraph is tweetable on click. The code is:
    Notice the "href" attribute. The first part, https://twitter.com/intent/tweet?, is what makes a Twitter box pop up. The next part, text = ...whatever..., is what auto-populates the Twitter box. Notice also that all non-text, non-comma, and non-period characters are represented by their ASCII symbols. Thus, a space is represented by %20. Want to try out such clickable sentences in your semi-formal documentation such as customer-facing blogs?
  • No page is an "orphan". Every Udaan article links to some other related article.
  • Every page is page one, so each article is complete in itself, with full header, footer, author details, content, acknowledgement, references, related links, and a ToC.
  • Progressive disclosure techniques were applied. You don't need to leave the page to read stuff like author profile (shown on mouseover) or to listen to audiocasts and watch video versions (shown within popup boxes). Footnote text is revealed on mouseover, so you don't have to jump back and forth on the page. Every link has a tooltip that shows the first sentence (the summary or teaser sentence) of that article, so you can decide before you click whether to go to that article. (To see all of these together on one page, see Asha Mokashi's leadership article.)
  • Every page has a unique <title> tag (which won't be rendered on the browser page but is anyway picked up by search engines).
  • Every URL has descriptive link text.
  • For universal accessibility, we included voice versions. You are no longer tied down to a read-the-text scenario.
  • Voice interviews have the transcripts included, so you can read them if you don't want to listen in.
A word about the multimedia. The audio files are playable on click, achieved by using the embedded player feature of HTML5. The code is:

Notice the "control" attribute of the <audio> tag. That's what specifies whether the player controls like Play and Pause should be displayed. Notice also text within the the <audio> tag. That's what's displayed if a browser cannot handle multimedia. I used words that describe what's gone wrong and what could be a possible solution.

Udaan was to live online and its pages had several links. To eliminate broken links, I used Xenu link sleuth. I tried making a custom 404 page but discovered that's possible only with paid GitHub. Since the entire purpose of using Xenu was to eliminate a 404 scenario, this didn't bother me much.

So much for the web version. Let me move on to the .epub and .azw versions, to be used on iPads and Kindles.

I used Sigil to create an EPUB file. I imported the HTML files into Sigil, and generated the EPUB output. I then imported this EPUB into Calibre and generated an AZW file.

This part took longer than I expected. An EPUB file is just an archive file (just like a .zip file is) that contains XHTML for the basic content, and some other folders and files that tell computers that this is an EPUB file. The folder structure is fixed, and Sigil generates them for you.


All you need to do is put your content in the appropriate folders: audio, images, styles, and text. If I was embedding a video clip, there would've been another folder named "Video".

The difficult part was the styling. I discovered that the CSS that looks great on a laptop looks awful in an e-reader. The task of manually stripping the HTML tags of all styling took about 2 hours. (This is where I miss DITA). I created a CSS file just for the EPUB, and specified some very basic styling such as line spacing and a background colour for headings. The rest of the output is the default HTML style. The end result might look plain on a computer screen but looks clean and neat on iPads and Kindles.

For the audio, I used Audacity. Thanks are due to these people: Prachi Karnik, Deval Faldu, Santosh Krishnamurthy, Mihir Mishra, Mayuri Baruah, and Jolein Vadhariya; they gave their time to record someone else's articles. Another instance of giving back to the community. After the sound bytes were recorded, I discovered they'd been done in various file formats, so I first used Audacity to convert them to .mp3. Then I googled for how to filter out background noise, how to trim the too long silences, and how to amplify the quiet bits. The sound files still lack that "professional" feel but I think they serve our purpose fine.

And Udaan was ready to be released.

What didn't work well:
  • The favicon. Like I said, I just couldn't get it to work on Chrome and IE. Also, I had to manually insert the favicon link in every HTML file; inserting it only in the boilerplate header file did not work.
  • The author profiles. I did a copy-paste from the home page to the individual articles. The content-referencing just did not work with the CSS I was using to show the profiles on mouseover. Maybe I need to create separate files for every author (in other words, maybe HTML content referencing works only for a *file* and not for an *anchor* within a file).
  • The multimedia in the EPUB file plays only with the iBooks app. I couldn't figure out a way to make it play with the default iPad reading app.
  • For the print version, there wasn't an easy way to port content from HTML to MS Publisher. Porting was done manually by Sangeeta Raghu Punnadi. Then, Nibu Thomas did the layout and inserted the sponsor ads and the fillers. It was not easy; "nightmare" was a word that was used several times. It also meant that content lived in two places: GitHub and Publisher. We couldn't figure out a way to create a "book" of discrete HTML files and turn that into a format that our print vendor could use.
  • I could not directly make an EPUB or AZW file from GitHub, which allows only HTML publishing, not other types of publishing.
What would've been nice to have but no time, no money, etc:
  • A copybook-like underline style looks charming on iPads.
  • The print version could also have had some interactivity. For example, think of Anagha's article on commonly-confused words. Now, think of a book page where you first read the tip, then lift a flap to see a cartoon underneath on the very same page.
  • The print version could've had perforated pages for the blank pages (where you could've taken notes during the conference) and for the checklists in Rajib's leadership article (which you could've tacked to your workdesk for ready reference).
What software was used:

Which free CSS and JS files were used:
So, yeah, that's the Udaan back-end story. The entire source code of Udaan is on this GitHub repo: https://github.com/UdaanSTC/GutsAndGlory/tree/gh-pages.

And Udaan itself is here: Udaan.

Here's what Udaan Print looks like: 100 pages, A5, hardbound, all colour. Distributed to the 350-odd people who came to the conference.




To see a 3-minute video on the Udaan story, see this:




* iBooks EPUB picture is from https://www.ibm.com/developerworks/community/blogs/aimsupport/entry/cics_transaction_server_now_has_epub?lang=en
** flap-in-book picture is from http://www.booktryst.com/2011/04/anatomy-gets-animated-in-rare-flap.html