How to export a blog as a document 19 May 2011 Anyone got smart ideas here? I want to convert my whole blog into a formattable document, including comments, with a view to doing a book format (The Very Best of Evolving Thoughts). I want to be able to edit it, and put it through InDesign, and I want to do the whole thing in one go. I’ve tried importing WordPress’ XML files, but nothing works. I’ve tried finding an export plugin, but nothing exists (though many have asked for it on the forums). I can export individual posts to InDesign, but that means I can’t convert the resulting markup to Word or Pages. Any ideas, oh wonderful crowd of readers? Anyone want to write a WRX to RTF filter? Administrative Administrative
Administrative Testing Tweet Embedding 1 Oct 201110 Oct 2011 Using a hint from here: http://twitter.com/#!/john_s_wilkins/status/119905993344827394 http://twitter.com/#!/john_s_wilkins/status/119905698241978368 http://twitter.com/#!/john_s_wilkins/status/119905579950026752 http://twitter.com/#!/john_s_wilkins/status/119765002872827905 Testing LaTeX embedding: [latex]H= -K\sum^{n}_{i=1} p(i) \log p(i)[/latex] [latex]d(x, y) = \alpha f(X – Y) + \beta f(Y – X) – \theta f(X – Y) \quad \alpha, \beta,\theta \geq 0[/latex] Read More
Administrative Talkorigins.org back up 12 Jan 2009 The website www.talkorigins.org is now back up, although links to the temporary archive www.toarchive.org/ still work for now. The story is roughly this – the company (joker.com) we bought the domain name from reassigned the IP number for the site as part of changing their data centre. They apparently sent… Read More
Administrative Miscellany 8 Nov 2007 Some things that piqued my interest without triggering a full post: Read More
Build a screen scraper? http://nokogiri.org/ Here’s a crude sketch in Ruby: require 'nokogiri' require 'open-uri' @doc = Nokogiri::HTML(open("https://evolvingthoughts.net/2011/05/how-to-export-a-blog-as-a-document/")) # class entry-title # class entry-meta # class entry-content @title = @doc.at_css("h1.entry-title").text puts @title @meta = @doc.at_css(".entry-meta").text puts @meta @content = @doc.at_css(".entry-content").text puts @content More sophisticated parsing, extraction and persistent storage would be necessary.
I just needed a historians to do the research for me! Thanks, Chris – I’ll report back on how well it works.
I’m with Chris – I’ve used anthologize for this purpose & it’s not bad. It doesn’t give you perfectly clean copy, but it certainly gives you enough to work with.
Perhaps somebody like Ed Yong, who published from his blog, would know better? Can you do any LaTeX wizardry to your wordpress files?
I can write a grep file in a number of environments, but I really hoped someone else would do that for me. I used to do that for a living, and it’s really, really, boring.
When I had to export my whole blog, I was able to set the RSS feed to display all entries. That’s a cinch to import. But WordPress may not have such a setting.