How to export a blog as a document

Anyone got smart ideas here? I want to convert my whole blog into a formattable document, including comments, with a view to doing a book format (The Very Best of Evolving Thoughts). I want to be able to edit it, and put it through InDesign, and I want to do the whole thing in one go.

I’ve tried importing WordPress’ XML files, but nothing works. I’ve tried finding an export plugin, but nothing exists (though many have asked for it on the forums). I can export individual posts to InDesign, but that means I can’t convert the resulting markup to Word or Pages.

Any ideas, oh wonderful crowd of readers? Anyone want to write a WRX to RTF filter?

9 Comments

Filed under Administrative

9 Responses to How to export a blog as a document

  1. Matthew Platte

    Build a screen scraper? http://nokogiri.org/

    Here’s a crude sketch in Ruby:

    require 'nokogiri'
    require 'open-uri'

    @doc = Nokogiri::HTML(open("http://evolvingthoughts.net/2011/05/how-to-export-a-blog-as-a-document/"))
    # class entry-title
    # class entry-meta
    # class entry-content

    @title = @doc.at_css("h1.entry-title").text
    puts @title

    @meta = @doc.at_css(".entry-meta").text
    puts @meta

    @content = @doc.at_css(".entry-content").text
    puts @content

    More sophisticated parsing, extraction and persistent storage would be necessary.

       0 likes

  2. Chris E

    Does the Anthologize plug-in work for you? http://anthologize.org/about/

       0 likes

  3. I’m with Chris – I’ve used anthologize for this purpose & it’s not bad. It doesn’t give you perfectly clean copy, but it certainly gives you enough to work with.

       0 likes

  4. Ben Breuer

    Perhaps somebody like Ed Yong, who published from his blog, would know better?

    Can you do any LaTeX wizardry to your wordpress files?

       0 likes

    • I can write a grep file in a number of environments, but I really hoped someone else would do that for me. I used to do that for a living, and it’s really, really, boring.

         0 likes

  5. When I had to export my whole blog, I was able to set the RSS feed to display all entries. That’s a cinch to import. But WordPress may not have such a setting.

       0 likes

Leave a Reply