Saturday, December 20, 2008

How to mechanise the harvesting

The process we followed for Phase 1 of the TWIN eBook was:

  1. Copy-paste the useful stuff into a GoogleDoc file. Each category has a file.

  2. In Robohelp, create an HTML file out of each question in a category, and in the navigation tree, bunch all questions of a category under their category.
    Saving the GoogleDoc files as Word files, and importing to Robohelp didn't work coz during the Googledoc > Word conversion, the formatting (and hence, the sequence of questions and answers) went for a toss.

  3. Export Robohelp files as Word files, proofread.

  4. In Robohelp, incorporate edits of stage 3.
    Which means, compare the text line-by-line.

  5. Generate CHM.



To me, this seems unnecessary labour; stages 2 and 4 are especially wasteful. So, over the past several days, I've been thinking of changing the eBook process so that we can:

  • eliminate the duplication of copy-paste at the compilation stage

  • do on-going edits



I've come up with the following plan:

  1. Use a database (simple row-column db) to store data. The db needs to let us:

    • Append through a Web-based form. Fields: category_main, category_sub, question, answer 1_name_email_date, answer 2_name_email_date, answer 3_name_email_date etc.

    • Search, based on fields

    • Define user levels: writer can put and get, editor can put, get and change.


  2. For compiling, pull the data from the database and append XML tags. Each column is a tag.

  3. Transform the XML to HTML, compile as CHM. The "category_main" and "category_sub" tags tell us the TOC.



Let's see how it goes. I'll need to research a bit.

Related post: Harvest, separate grain from chaff, release to market

2 comments:

Mr. M said...

Hi AB,

I do agree with your point on the RH work that needs to be done. But provided that we need to perform a proof-read work, is it possible to do it?

But still going by your points on XML, my thoughts on it:

Transform the XML generated from the database using XSLT.
Use the parsed XML in one of the compilers.
We can also check options for proof-read here.

I can check the compilers that enable you to perform this function.

Mr. M said...

And for your second point where you say that the import does not work because of the Google Doc conversion, just one point here. I never tried this option of importing files. This is because we had planned a requirement which was completely different. If we were just to display the responses as such, I would have gone ahead with importing of files. But we had planned to create a drop-down for every response which would have made the importing of files a tedious task rather than copying contents.