Static Publication of Dynamic Content

April 24, 2007 7:31 PM
Related Categories: Web Dev

We have a client who needs to take the content and pages of our Coldfusion CMS and store the information as static HTML on another site and possibly server.  Whenever we get a request like this I like to find a way to build the functionality into the core product if it is feasible.   In reviewing their needs I found a few possible ways to go. 

One would be when a client creates, updates or deletes a page in the CMS to publish an HTML only version of it to a designated area.  Perhaps adding functionality for FTPing to another server as well. 

The files are all named .cfm and link to each other extensively.  To save as static html should name as .html and thus we need to find/replace such references in the content.  about.cfm becomes about.html and links to about.cfm become links to about.html.  But what happens if a link to .cfm file isn't a local link but external to another cf based site (adobe for example)  I don't want that to be overwritten.

What happens if they simply modify a css file or upload an image/pdf?  I really should have a listener for changes to the file system as well. If a file is added or removed that isn't .cfm then copy or remove it in the external staging area.  Then perform the task also via FTP if ftp is needed.

What happens if the FTP fails.  Should there be an automatic retry built in?  What if someone accidentally removes files at the remote ftp site.  Should that be automatically detected?  Perhaps a function should be added to override defaults and push a full publication to the server when needed.

All of this is complicated by the fact that our cms right now requires no mappings or virtual folders to run.  I really don't want to start building in this as a requirement.  Shared hosting support must be maintained.

Add in calendaring and shared data and static publication starts to be a real pain.

What seems like a simple task becomes much more complex.  The specific client doesn't need all the functionality above, but why do it only half way? 

Our CMS excels in its ease of use and strong core features.  We don't offer all the functions that other CMS's provide but are a fraction of the price and a breeze to install and get started with.  No mappings, one config file.  I don't want to build in features that require us to change what so many people love about our product.

What have others done regarding this sort of situation?  What would you like to see our CMS do?  There are many other features on our list.  We may have to put off the big version of this until a later date.


Like this entry? Subscribe to my blog.

Comments (moderation on)

One CMS inserts links into content as something like 'href="reference.cfm?id=4545"', and it scans for 'href='link.cfm?id=' at runtime, looks up the id number, and replaces the dummy link with the proper domain and path.

This lets the resource (a page, file, etc.) be moved, changed, renamed, and so on. In effect, the link is simply a doubly-dereferenced "pointer" to the data. This also makes "your" links stand out from any others the user may have added.

Just food for thought.
# Posted By Michael Long | 4/24/07 9:18 PM
Joshua I would question why they want it published as HTML?

Is it because of performance? Has someone told them its good for SEO?

On changing .cfm links - I guess you could limit it to rewriting links to the local site.

But it seems like a lot of overhead in terms of development time and adding more complexity
to the system for a requirement which isn't that widely requested (unless I'm wrong and it is?).

Finally perhaps the solution is to look outside of the cms - i.e. the cms would be responsible for copying pages
to a staging area - then something else - a directory watcher or rsynch would be responsible for
synching with a production machine. Just a thought
# Posted By Kola | 4/25/07 6:55 AM
All good questions/comments. The need is that sometimes people need the functionality, however don't have the ability to store anything on the final destination server except html. It would be a handy feature to sell to clients who have no control over hosting and want to host with the lowest cost host possible, but as you have noted there is quite a bit involved.

Ultimately the client need is only a few pages. Not a big deal. But as I mentioned before it is good to consider if the bit of custom code should be rolled into a full feature. Sometimes it is and sometimes it just isn't going to work out.

I see this may have to split the product line. One version is a hosted versino we maintain and the other is the traditional product. Not sure we are ready to do that yet.
# Posted By Joshua | 4/25/07 8:40 AM
Ah. You've just hit the BIG question in the CMS space. Before I left for greener pastures, I was playing with all kinds of solutions from simplistic (wget) to complex (dependency tracking and chaining). The big problem was those modules that reloaded a given page with a slightly different query string (our calendar stuff was a big problem). I had varying degrees of success with each approach, but I was really starting to think that the only way to really handle this well was to engineer for it from the ground up.

In our case, the primary reason for the push was performance and I had already written a messaging system that allowed us to synchronize multiple servers so we just had the customer who really needed it throw more servers at the problem.

As far as I know, no more progress has been made on the "real" problem. I'd love to know what you do if you find a solution that really works for you.
# Posted By Rob Wilkerson | 4/25/07 8:47 AM
Anyone use any desktop software to just download a site? It may work as a quickly option for them. Gotta love the quick and dirty solutions! I found a few, but honestly a few of them look like they are not safe. Will they install spyware at the same time? hmmm
# Posted By Joshua | 4/25/07 5:37 PM
Yep, that's the wget solution I was speaking of. It will recursively suck down a site. You could then pipe the file names through find or something similar to replace the .cfm extension with .htm(l), if you choose. wget is standard for Unix, but there are Windows variants out there. A quick Google should offer a number of solutions.
# Posted By Rob Wilkerson | 4/25/07 5:42 PM
Ah, would that help with images as well? I will look at that some more tomorrow, thanks!
# Posted By Joshua | 4/25/07 6:02 PM

Sponsors


Savvy Content Manager