[Home] Migrate from blogger to jekyll

Goal

Migrate from blogspot to Jekyll.

todo

Pre-reqs

Turn full rss feeds on for your blog

  1. Navigate to your blog http://yourblog.blogspot.com
  2. Sign in
  3. Click on the edit link for any post
  4. Click settings
  5. Click site feed
  6. Click advanced mode
  7. Select full for all options
  8. Save

Import

The caveats are that you either lose a lot of formatting or a lot of your time. You pick.

with vilcans' Jekyll rss_importer

BLOGGER=coolaj86

git clone http://github.com/vilcans/jekyll.git
cd jekyll/
git branch -a
git checkout origin/rss_importer
git checkout -b rss_importer 
git branch
mkdir -p _posts
sed -i "s/require \"YAML\"/require \"yaml\"/" ./lib/jekyll/converters/rss.rb
wget http://${BLOGGER}.blogspot.com/feeds/posts/default?alt=rss -O ${BLOGGER}.rss.xml
ruby -r './lib/jekyll/converters/rss' -e 'Jekyll::RSS.process("'${BLOGGER}'.rss.xml")'

Use the by-hand approach

BLOGGER=coolaj86

wget --convert-links --html-extension --mirror --random-wait --wait 3 http://${BLOGGER}.blogspot.com/

Essentially you would want to parse

If you write a script to strip out all of the garbage and keep the post + formatting, I'd love to hear about it.

Here's a post that will get you halfway to converting html to markdown.

Categorize by Blog

I'm using Fastr as a template for my blog. Fastr supports categories with vanilla Jekyll.

Here's a script I used to go through one of my blogs, which was created back when there was no title field:

BLOG=coolaj86
ID=0 # Fastr doesn't allow posts of the same name

cd ${BLOG}_posts
ls | while read POST; do
  sed -i "s/^title:/title: untitled ${ID}\ncategories: ${BLOG} uncategorized/" ${POST}
  mv ${POST} `basename ${POST} .html`_${ID}.html
  let ID=ID+1
done

And the other, which thankfully did have titles:

BLOG=thesystemisntdown

cd ${BLOG}_posts
ls | while read POST; do
  sed -i "s/^\(title:.*\)/\1\ncategories: ${BLOG} uncategorized/" ${POST}
  let ID=ID+1
done

And then to give them the Fastr layout

ls | while read P
do
  sed -i "s/layout: post/layout: article/" ${P}
done

Possible Errors

If you didn't enable full rss feeds (and click save):

No content in RSS item '2006_03_01_archive'
Created 0 posts!

If you didn't replace "YAML" with "yaml":

/home/user/jekyll/lib/jekyll/converters/rss.rb:5:in `require': no such file to load -- YAML (LoadError)
  from /home/user/jekyll/lib/jekyll/converters/rss.rb:5:in `<module:Jekyll>'
  from /home/user/jekyll/lib/jekyll/converters/rss.rb:1:in `<top (required)>'
  from ruby:0:in `require'

If you don't have a _posts:

 http://coolaj86.blogspot.com/2010_05_01_archive.html#8976446356395410673 -> _posts/2010-05-06-2010_05_01_archive.html
/home/user/jekyll/lib/jekyll/converters/rss.rb:39:in `initialize': No such file or directory - _posts/2010-05-06-2010_05_01_archive.html (Errno::ENOENT)
  from /home/user/jekyll/lib/jekyll/converters/rss.rb:39:in `open'
  from /home/user/jekyll/lib/jekyll/converters/rss.rb:39:in `block in process'
  from /usr/local/lib/ruby/1.9.1/rexml/element.rb:906:in `block in each'
  from /usr/local/lib/ruby/1.9.1/rexml/xpath.rb:64:in `each'
  from /usr/local/lib/ruby/1.9.1/rexml/xpath.rb:64:in `each'
  from /usr/local/lib/ruby/1.9.1/rexml/element.rb:906:in `each'
  from /home/user/jekyll/lib/jekyll/converters/rss.rb:16:in `process'
  from -e:1:in `<main>' 
blog comments powered by Disqus Updated at 2010-08-21