I’m trying to convert a blog into an EPUB and keep running into issues with existing tools.
I first tried blog2epub, but it fails during parsing with:
lxml.etree.XMLSyntaxError: Opening and ending tag mismatch: meta line 10 and head, line 17, column 8
I then tried WebToEpub on Firefox, providing:
- Content selector:
.article-content - Chapter title selector:
.title
It generated an EPUB, but the file wouldn’t open in any reader.
What I’m looking for is a tool where I can point to a blog’s base URL, define CSS selectors for the article title and body, and have it automatically fetch all entries and create one chapter per post. Or something similar.
Does anyone know of a reliable tool, script, or workflow that does this well on Linux?


HTML 5 in actual production use is only partially convertible (it’s lossy). You need to get handsy with it. *
But one way around: get a markdown editor that can convert copy&paste from the web (i know of typora, it fetches (and opt. saves) images too) and then pandoc that.
* div#main, a.h1, div with naked text, i’ve seen things…