How to export a blog as a document 19 May 2011 Anyone got smart ideas here? I want to convert my whole blog into a formattable document, including comments, with a view to doing a book format (The Very Best of Evolving Thoughts). I want to be able to edit it, and put it through InDesign, and I want to do the whole thing in one go. I’ve tried importing WordPress’ XML files, but nothing works. I’ve tried finding an export plugin, but nothing exists (though many have asked for it on the forums). I can export individual posts to InDesign, but that means I can’t convert the resulting markup to Word or Pages. Any ideas, oh wonderful crowd of readers? Anyone want to write a WRX to RTF filter? Administrative Administrative
Administrative Hello? Can you hear me now? 26 Jun 201226 Jun 2012 As I sit at Berkeley in the warm (suck on that Melburnians), I am moved to ask: can anybody hear me? My server provider “upgraded” their hardware with the immediate result that I couldn’t access or even see my blog for about 6 or so days. Of course this happened… Read More
Administrative Passing thoughts and miscellany 8 Oct 20118 Oct 2011 First of all it occurs to me that people who expect the Singularity to occur simply do not get the logistic growth curve. I’ll just throw that out there. Second, the Great Migration Back to the Homeland (i.e., my move back to Melbourne) happens this week so I will probably… Read More
Administrative Interlude of peace and love 13 Feb 2008 Have you ever noticed that there are occasionally periods in which things just work, particularly with computers? I find that there is a confluence of coherence about every four years. I’m not sure if it’s just because the vendors – the Evil Apple Empire, or Micro$oft, whoever – recognises that… Read More
Build a screen scraper? http://nokogiri.org/ Here’s a crude sketch in Ruby: require 'nokogiri' require 'open-uri' @doc = Nokogiri::HTML(open("https://evolvingthoughts.net/2011/05/how-to-export-a-blog-as-a-document/")) # class entry-title # class entry-meta # class entry-content @title = @doc.at_css("h1.entry-title").text puts @title @meta = @doc.at_css(".entry-meta").text puts @meta @content = @doc.at_css(".entry-content").text puts @content More sophisticated parsing, extraction and persistent storage would be necessary.
I just needed a historians to do the research for me! Thanks, Chris – I’ll report back on how well it works.
I’m with Chris – I’ve used anthologize for this purpose & it’s not bad. It doesn’t give you perfectly clean copy, but it certainly gives you enough to work with.
Perhaps somebody like Ed Yong, who published from his blog, would know better? Can you do any LaTeX wizardry to your wordpress files?
I can write a grep file in a number of environments, but I really hoped someone else would do that for me. I used to do that for a living, and it’s really, really, boring.
When I had to export my whole blog, I was able to set the RSS feed to display all entries. That’s a cinch to import. But WordPress may not have such a setting.