Thursday, May 29, 2003
A Proposal for a Blogging API
There is talk in the blogging circles about standardizing on a blogging API. As Evan Williams (of Blogger) says:
We would love if there were one universal API for blogging tools. It's clear why this would benefit everyone. Our edict at Google is to help the blogging industry. Raise all boats. And, as I've said before, we're not interested in doing more work for the fun of it.
The consensus seems to be the need for an all dancing, all singing API that is implemented by all blogging software; that this is needed for the community to prosper. But I am not a fan of all singing, all dancing APIs as they either tend to be too simplistic in what they support or overly complex to provide coverage for everything currently being done. Too simple, and it's useless if the underlying software can do more. Too complex and it's hard to test, and hard to write for with excessive data being required just because some obscure product from Andytown, Florida provides it and no one else does, but have to support it anyway.
There's even a comparison between competing APIs which is fine as far as it goes, but only covers what exists now (or will very shortly); what it does not do is look into what features existing blogging software has, nor possible future features (fotoblogs are a relatively recent phenomenon that may or may not be easily supported using the existing APIs). If you are going to do an all singing, all dancing API (since some of the early blogging APIs are too simple in what they allow) then you might as well figure out what features are common across blogs, which ones aren't and therefore how to extend the API without breaking anything.
And hopefully without making it so heavy weight that it's a pain to use. I personally think it's still too early and that the problem isn't fully understood to get a clean well designed API in place, but I can try anyway.
I've looked over the existing and some of the proposed APIs and made notes as to what is needed, what should be optional, and how to provide for extensions. I'm going to skip for the moment the actual transport protocol (DCE RPC, XML RPC or SOAP via HTTP on TCP, other alphabet soup protocols, etc.) and instead concentrate on what the API should do, and what it needs as far as data (and I'm borrowing liberally from the existing APIs as well).
Of primary concern is authentication. While it would be nice to use the existing authentication methods built into HTTP, not all transport protocols tunnel exclusively over HTTP (SOAP, for example), and we can't totally rely on the webserver to allow end user websites to configure this (most, yes, but not all). So including some method of authentication is probably a Good Thing™ so (fields optional unless otherwise specified, types default to strings unless othewise specified; fields marked as binary can contain arbitrary binary data, default values (if any) may be given):
auth DATA { method : REQURIED ENUMERATION = 'Basic', userid : REQUIRED, credentials : REQUIRED BINARY }
The method
would most likely be Basic
in which
the credentials
would then be the password (possibly
base64
encoded, much like the basic authentication in HTTP. Since the method of
authentication is included it can easily be extended for greater
security.
Some of the APIs also require data about the client software (Blogger required this, so too does Google to use their API) so might as well define data for that:
client DATA { appkey : REQUIRED, id, version }
Before getting to the data and methods to allow posts, one of the deficiencies I've notice is the rather poor support for users (defined as people who can post to the blog)—note the plural usage there. They seem to assume a blog has sole authorship; not to say that a blog with multiple authors can't use the existing APIs, but it has to be shoehorned in and there is no provision for the owner of the blog (or anyone else with administrative rights) to allow or disallow others to post entries. Another thing to consider is that a person might be allowed to post to multiple blogs (assuming all the blogs in question reside on the same server—think Live Journal or Blogger). Also, perhaps a way to create a new blog on the server through this proposed API:
user DATA { userid : REQUIRED, fullname, email, blogs DATA[] // this is an array { blogid : REQUIRED, rights : REQUIRED ENUMERATION } } // -------------------------------------- // The catagory enumeration consists of // 'none' , 'light' or 'heavy' // more on this below. // -------------------------------------- features DATA { anonposts : BOOLEAN = 'false', // more on this way below comments : BOOLEAN = 'false', // comments not supported trackback : BOOLEAN = 'false', // trackback not supported templates : BOOLEAN = 'false', // template editing not supported clientid : BOOLEAN = 'false', // client data is required catagory : ENUMERATION = 'none' } blog DATA { blogid : REQUIRED, features : REQUIRED features DATA, fullname, url, startdate : DATE-ISO8601, users DATA[] { userid : REQUIRED, rights : REQUIRED ENUMERATION }, templateid[], catagoryid[] } STRING user.edit // returns userid ( auth : REQUIRED auth DATA, //user loggin in blogid : REQUIRED, user : REQUIRED user DATA, auth : REQUIRED auth DATA, //for user being added client : client DATA ) user DATA user.info ( auth : REQUIRED auth DATA, blogid : REQUIRED, userid : REQUIRED, client : client DATA ) BOOLEAN user.delete ( auth : REQUIRED auth DATA, blogid : REQUIRED, userid : REQUIRED, client : client DATA ) STRING blog.edit //returns blogid ( auth : REQUIRED auth DATA, blog : REQUIRED blog DATA, client : client DATA ) blog DATA blog.info ( auth : REQUIRED auth DATA, blogid : REQUIRED, client : client DATA ) BOOLEAN blog.delete ( auth : REQUIRED auth DATA, blogid : REQUIRED, client : client DATA )
The purpose of the features DATA
is to indicate which
portions of the API the blog supports, and which ones it doesn't. So if
templates
is false
then there is no use in the
client calling the template portion of the API (catagories are handled somewhat
differently but I'll get to that). The rights
enumeration
would probably be something like owner
, create
(an
entry or post), edit
(can edit own posts), delete
,
edit-others
and delete-others
(with the owner able
to do all the above). The templateid
array contains a list of
currently defined templates that can be manipulated (if supported):
template DATA { templateid : REQUIRED, body : REQUIRED STRING, type : ENUMERATION, name } STRING template.edit // returns templateid ( auth : REQUIRED auth DATA, blogid : REQUIRED, template : REQUIRED template DATA, client : client DATA ) template DATA template.info ( auth : REQUIRED auth DATA, blogid : REQUIRED, templateid : REQUIRED, client : client DATA ) BOOLEAN template.delete ( auth : REQUIRED auth DATA, blogid : REQUIRED, templateid : REQUIRED, client : client DATA )
You'll notice that I have edit
methods but no
create
methods. In the existing APIs there is no real difference between
creating an object and editing an object except for the return code (usually
the create call returns an id, and the edit call returns a boolean). I
don't really see the need for such a distinction; in the API I have, if the
object doesn't exist when you attempt to edit it, it is created. Even
though this means that the edit
method may end up doing two
jobs (creation, and/or modifying) I feel it's cleaner doing it this way.
The user interface can hide this though (if I may use some pseudocode
here):
if (action == 'new') { message_box(templateinfo,templatedata,EMPTY); } else if (action == 'edit') { templatedata = template.info(auth,blogid,whichtemplate); message_box(templateinfo,templatedata,USE_EXISTING_DATA); } template.edit(auth,blogid,templatedata);
Catagory support is not easy—my own software (mod_blog) for instance, has very limited support for catagories (which I call classifications); it pretty much keeps track of catagories as a comma delineated list of catagories (which is more or less free form) so in that case, I'd like the catagory support to be pretty light. But Moveable Type seems to have a bit of a heavier weight interface for catagories. We can define support for the heavier Moveable Type catagory interface:
catagory DATA { catagoryid : REQUIRED, name : REQUIRED, primary : BOOLEAN } STRING catagory.edit // returns catagoryid ( auth : REQUIRED auth DATA, blogid : REQUIRED, catagory : REQUIRED catagory DATA, client : client DATA ) catagory DATA catagory.info ( auth : REQUIRED auth DATA, blogid : REQUIRED, catagoryid : REQUIRED, client : client DATA ) BOOLEAN catagory.delete ( auth : REQUIRED auth DATA, blogid : REQUIRED, catagoryid : REQUIRED, client : client DATA )
But this is overkill for Blogger and my own software. I decided to hedge
and in the features DATA
I defined the catagory
feature as an enumeration: none
, light
for blogs
like Blogger and my own where catagories are simple strings and
heavy
for blogs like Movable Type, where catagories are more
integral to the system. The intent is that a system with a catagory
enumeration of none
or light
won't have to support
the above portion of the API.
So now we come down to the whole point of blogging: posts. While the existing APIs assume text based entries, you can shoehorn in other types but it's not exactly what I would call clean (and the MetaWeblog API has definite ideas of what constitutes a post, some of which doesn't map that well to other blogging software, like … oh … my own!) and there is wide difference in metadata support (stuff like titles, catagories, timestamps, etc.)—quite the mess.
post DATA { timestamp : DATE-ISO8601 = currenttime(), userid, author : user DATA = { userid = 'anoncoward', fullname = 'Anonymous Coward', }, title, catagoryid[], templateid, permalink, parentid, childid[], trackback DATA[] { title, excerpt, url, blog_name }, body DATA[] { content-type : REQUIRED, data : REQUIRED BINARY, name, content-encoding }, status DATA { publish : BOOLEAN = 'true', syndicate : BOOLEAN = 'true', allowcomments : BOOLEAN = 'false', anoncomments : BOOLEAN = 'false', iscomment : BOOLEAN = 'false', comments : NUMBER, } } filter DATA { startdate : DATE-ISO8601 = blog.startdate, enddate : DATE-ISO8601 = currenttime(), number : NUMBER = 100, startpostid, endpostid, published : BOOLEAN = 'true', syndicated : BOOLEAN = 'true' } STRING post.edit // returns postid ( auth : REQUIRED auth DATA, blogid : REQUIRED, post : REQUIRED post DATA, client : client DATA ) post DATA post.info ( auth : REQUIRED auth DATA, blogid : REQUIRED, postid : REQUIRED, client : client DATA ) BOOLEAN post.delete ( auth : REQUIRED auth DATA, blogid : REQUIRED, postid : REQUIRED, client : client DATA ) STRING[] post.listids ( auth : REQUIRED auth DATA, blogid : REQUIRED, filter : REQUIRED filter DATA, client : client DATA ) post DATA[] post.list ( auth : REQUIRED auth DATA, blogid : REQUIRED, filter : REQUIRED filter DATA, client : client DATA )
If you look closely to post DATA
you'll notice some wierd
things about it. First off, why have both userid
(which
indicates the author of the post) and author
(of type
user DATA
); then there's parentid
and
childid
. The intent of this wierdness is comment support.
There really isn't that much difference between a regular post and
a comment to a post—they're both entries written by a person. The major
difference being that a post won't contain a parentid
(well, I
suppose it could) while comments to that post will (which is the
postid
of the post the comment applies to). Threaded comments
fall out of this, if you allow a parentid
to be the id of a
comment itself.
The intent for having both userid
and author
is
to allow for anyone to post comments (if allowed). This could also be used
to allow anyone to make posts without the user having to be added first!
And why bother with an API for comments when it'll mostly duplicate the posts
API?
I defined the body of post DATA
as an array, each element
containing the content type and data to allow uploading of not only any type
of data, but multiple types of data. On my own blog I often include images
in with my posts, so this allows me to not only include text, but the images
as well (and I could use the optional name
field to specify the
filename on the server end) as one self-contained call (which is something
else I've noticed that the other APIs haven't addressed).
Is this better than what's out there now? I don't know—my own blogging
software doesn't support any of the existing APIs (in fact, I primarily use email to add
entries, using existing email headers and some of my own (like the
Subject:
header for the title) for the meta data. And I think
I've covered most of the territory, plus added some other features I've felt
were missing or underdeveloped, and I hope that by writing this, I can get
some discussion going. But in the end, it's the code that speaks, not the
spec.