How to export a blog as a document 19 May 2011 Anyone got smart ideas here? I want to convert my whole blog into a formattable document, including comments, with a view to doing a book format (The Very Best of Evolving Thoughts). I want to be able to edit it, and put it through InDesign, and I want to do the whole thing in one go. I’ve tried importing WordPress’ XML files, but nothing works. I’ve tried finding an export plugin, but nothing exists (though many have asked for it on the forums). I can export individual posts to InDesign, but that means I can’t convert the resulting markup to Word or Pages. Any ideas, oh wonderful crowd of readers? Anyone want to write a WRX to RTF filter? Administrative Administrative
Administrative If this is Friday, I must be in Sydney 16 Apr 2009 I am briefly back from internetless visits to family in Victoria (my home state), and shortly to fly out to Lisbon where I am to give two talks I have yet to finish writing (of course! Not to worry, I always do this). In the interim I must proof my… Read More
Administrative Great. My new suburb gets bombed 4 Feb 2009 I move into a new place and less than three days later, it gets bombed. Well, strictly a place around the corner got bombed. I slept right through it. Seems someone doesn’t like the Hell’s Angels. I had no idea their clubhouse was nearby (not that it bothers me. I’ll… Read More
Administrative In which I express gratitude and humility 4 Jun 2009 When I asked for donations, I expected a few $5 donations here and there. I never expected the amounts I got. To the dozen or so donors so far, my immense gratitude. I feel like this: [youtube=http://www.youtube.com/watch?v=IynQCmqvXZs&hl=en&fs=1] Some have questioned my need for this, and this is a reasonable inquiry…. Read More
Build a screen scraper? http://nokogiri.org/ Here’s a crude sketch in Ruby: require 'nokogiri' require 'open-uri' @doc = Nokogiri::HTML(open("https://evolvingthoughts.net/2011/05/how-to-export-a-blog-as-a-document/")) # class entry-title # class entry-meta # class entry-content @title = @doc.at_css("h1.entry-title").text puts @title @meta = @doc.at_css(".entry-meta").text puts @meta @content = @doc.at_css(".entry-content").text puts @content More sophisticated parsing, extraction and persistent storage would be necessary.
I just needed a historians to do the research for me! Thanks, Chris – I’ll report back on how well it works.
I’m with Chris – I’ve used anthologize for this purpose & it’s not bad. It doesn’t give you perfectly clean copy, but it certainly gives you enough to work with.
Perhaps somebody like Ed Yong, who published from his blog, would know better? Can you do any LaTeX wizardry to your wordpress files?
I can write a grep file in a number of environments, but I really hoped someone else would do that for me. I used to do that for a living, and it’s really, really, boring.
When I had to export my whole blog, I was able to set the RSS feed to display all entries. That’s a cinch to import. But WordPress may not have such a setting.