At the core of all managed web sites is a content management system (CMS). Early in the development process, a platform and product is selected. There are currently over 400+ options. It is important to find the tool set which most closely matches the defined business requirements.
Choose an overly simple solution, and you will find yourself in the unfortunate position of needing to port your solution to a more capable system. Go to complex, and you become a slave to the implementation and tool experts, regardless of how expensive of flaky they may be.
Choosing the right WCM system for your website, or indeed for your enterprise can be both confusing and frustrating, you have over 500 systems to choose from, with more choices being added daily.
Whether that system is something complex or something simple (i.e. hand editing), your ability to implement and use your CMS is an essential part of a successful site. You must be able to enable content providers and editors to perform website updates (however inexperienced).
Here is a round-up of the posts in our community that relate to Web Content Management (WCM)
No fiesta
Every time an acquisition is published, it is time to celebrate. Well, it depends for whom.
I have learnt to love acquisition as a shareholder of the acquired company. I have learnt to rebuild teams, business and clients’ trust on the ruins of the acquired companies as a manager, several times. I have learnt to learn how tough, and rare, it was to acquire successfully a company as an investor, an employee or whatever, it is always a challenge.
Mission accomplished?
So I am not the kind to say “congrats” during an acquisition but to the team in charge of the acquisition, of course. They have “accomplished the mission” and anyone who had been in such business knows how tricky it can sometimes be. So yes, congrats to the management team of Day software, they have sold the company very much successfully.
But from my point of view, there is not so much to celebrate during an acquisition, especially for many of the employees and for many of the clients. The most difficult part is still to come. Acquisition does not matter so much compare to integration. It is like celebrating a deal whereas what matters is the Go Live of a project, not the signature of the contract.
Acquisition means “risky business”
OK let’s write “challenging” instead of “risky” to be positive.
I have not read today so much about how a mid size company with a strong taste of Swiss attitude and open source philosophy will fit into a mainly US centric international proprietary software company. I have also learnt to learn that during acquisitions people matter, but not so much actually. Not because you don’t want to, just because you cannot take care too much of the details and that’s what individuals are during an acquisition. Of course some individuals will find great opportunities thanks to the deal, that not the point, but just to say that an acquisition is not a fiesta for everyone, far from that: it mainly means “risky business”.
And one of the key advantages of Day has been several very gifted, skills and committed individuals. I am wondering how Adobe will handle the famous “The surprising truth about what motivates us”
Day Software was one of the rare independent high end WCM vendor and will now be just one of the products of a major software company. That’s a major evolution. Uncertainty will prevail for weeks if not months like for any similar acquisition, at least for many employees, even if, of course, Adobe may argue the opposite. Many are speculating and will speculate about the products integration, the strategic fit, the risks, the advantages, the constraints and so on. I don’t really care so much as my employer is a competitor so I have to focus at my clients, the team and to our own product but to finish my post on something funnier I cannot prevent myself motivating the shareholders of Day to agree on this acquisition. Let me be more specific:
Four Reasons to sell your shares to Adobe
1. The price is right
The share’s price of the company was extremely high before the acquisition, and I assumed the market was expecting something irrational, sorry I meant “exceptional”. And the market is so often right, so now it is really time to do something guys. Frankly I am impressed. I can write for pages why this valuation looks so unreal to me – maybe because I am coming from a different world - but frankly just sell for that price your shares. Exaggerating, I can write you are more likely to be hit by an asteroid tomorrow morning than to get a better deal soon.
2. That’s good for my business
Everybody knows an acquisition is always good for the business of the competition at least for the months to come. Beyond more than a year, as usual in this kind of business, it does not really matter so much, that’s another time frame and we always have to face competitors of many kinds: that’s business. What matters is adaptation and anticipation, not conjecture. So for the time being, I believe that’s good news for my employer’s business development as we have very frequently met Day software both in Europe and North America these last months.
3. You will give Adobe a cooler image
I really like the spirit and the values of the Day team. I don’t believe in miracles but I hope they will influence Adobe somewhere in a positive way. At least, they won’t be of any bad influence!
4. This summer is so boring like any summer, thanks for the show
For different reasons, and somewhat surprisingly, daily business is always thriving during July and August but conversely market news is usually so boring during these sunny weeks in Europe too. At last some news to really speak about during the summer break. Cool. And just before the CMS geek up cessions! Very cool.
So please give me a favor, just sell your shares to Adobe.
Further readings:
Analysis, interesting comments and some (inevitable) speculation from our WCM industry gurus: Adobe to acquire Day – First Take ECM perspective
Great critics – as usual – from Seth Gottlieb: Will Day stay committed to web standards under Adobe’s ownership?
CMS Wire article: Web CMS: Adobe Buys Day Software for US$ 240 Million
Boris-Magnolia’s own publicity but with several excellent remarks: http://www.betterfasterbigger.com/2010/07/day-to-be-acquired-by-adobe.html
Jon on tech blog post: http://jonontech.com/2010/07/28/a-fine-day-for-adobe/
Jeff Potts who was very quick to write about the acquisition: Adobe acquires Day Software for $240 million
I promised last time to show a simple way to render CRX content as PDF. The technique in question involves using a PDF form as the readymade container, into which form data is imported using XFDF. The latter is the XML version of Adobe's Forms Data Format, which in turn is a file format specifically designed to allow import and export of data to and from PDF forms.
The way it works is simple: Suppose you have a PDF form that you want to populate with data. You merely need to create a small data file (in XFDF format) and put it on the server. When a user requests the data file (which has a mimetype of "application/vnd.adobe.xfdf"), Acrobat Reader (or the Reader browser plug-in) detects the fact that form data will need to be imported into a form. The XFDF file itself contains a pointer to the actual form to be used. Reader fetches the form, then imports the form data into it, and renders the result as a PDF file containing the data. It all happens transparently to the user, and the user need only have Acrobat Reader (not a full copy of Acrobat Professional).
In the example I'm going to show below, we generate the XFDF file dynamically on the server, via a script called (what else?) xfdf.esp. We'll get to that in a minute.
The example we're going to talk about assumes that there is content in CRX (under a path of /content/films) that looks something like this:

This particular content node is named terminator_2. It lives under /content/films/ in my CRX repository.
Notice, in the above list, that there is a property (at the bottom) called sling:resourceType, set to a value of "films." This tells CRX to look under /apps/films for any scripts that might be necessary to render the content.
In previous blogs, I've shown how to write scripts that render this content as HTML, SVG, or CSV. Right now, what we need is an XFDF renderer. That turns out to be pretty easy to set up.
First, we need to create a PDF form to hold our data. In the Acrobat Professional forms editor, such a form looks like this:
I've shown how easy it is to push spreadsheet data into CRX (in such a way that there is one content node per row of data, where properties on that node correspond to column data). The reverse is also possible: It's easy to write a script that converts sibling nodes to row data formatted as CSV (comma-separated values per RFC 4180). Such a script, csv.esp, looks something like this:
The rules for escaping data for CSV are extremely simple. First, any data string that contains the double-quote (") character needs to have each such character converted to two double-quotes (""). Secondly, if the data contains a comma, the entire data string needs to be wrapped in quotation marks. The same is true for any data that contains double-quotes or line breaks (which RFC 4180 defines as CRLF -- carriage return followed by linefeed). The following very simple function enforces these escaping rules:
// Escape field data per RFC 4180
function escapeData( data ) {
// replace " with ""
data = String(data).replace( /"/g, "\"\"" );
// if data contains comma, CRLF, or "
// we need to wrap the entire thing in double quotes
var escapables = /,|(\r\n)|"/;
if ( data.match( escapables ) )
return "\"" + data + "\"";
return data;
}
The function that actually converts nodes to records is very straightforward as well:
function nodesToCSV( nodes, propertyNames ) {
var records = new Array( );
for ( var i = 0; i < nodes.length; i++ ) {
var aRecord = new Array( );
// suck in the data for each property:
for ( var k = 0; k < propertyNames.length; k++ ) {
var data = nodes[ i ][ propertyNames[ k ] ];
var escaped = escapeData( data );
aRecord.push( escaped );
}
records.push( aRecord.join( "," ) );
}
var CRLF = String.fromCharCode(13) +
String.fromCharCode(10);
return records.join( CRLF );
}
Note that we need to explicitly provide the function a list of property names, rather than (say) let the function iterate through property names on an introspective basis. The reason for this is that if we simply try gathering property names with a for/in loop, we will get back property names in no particular order. And the order will, in fact, vary from content node to content node even if all of the content nodes have properties with exactly the same names. The unorderedness of the properties (as obtained through simple iteration) would scramble the column data in our CSV file. We don't want that. Hence, we pass in an array of property names, and march through the array in orderly fashion when pulling property data from each node.
When I placed csv.esp in my repository under /apps/films and then navigated to http://localhost:7402/content/films.csv, CRX dutifully fired my script and produced a CSV file containing all of the data from my /films content nodes, causing my browser (in turn) to inform me that I was downloading a file of type "csv" (it then asked me what program I wanted to use to open the file; I specified scalc.exe, and OpenOffice dutifully loaded the file as a spreadsheet).
So far, I've shown how to render /films data as HTML, SVG, and CSV. Next time, I want to show a simple trick for rendering the data as PDF. It's easier than you think!
It's always good to get a glimpse into the approaches taken by non-OSS JCR implementations: In a recent technical article on the developerworks website Malarvizhi Kandasamy describes how IBM goes about JCR fulltext search. The actual engine is
Juru, which is a Java library developed by the IBM Haifa research lab
According to the article Juru is capable of some natural language processing like stemming or finding similar spellings.
IBM uses a JCR compliant repository in a number of their products, e.g. Lotus Web Content Management or WebSphere Portal.
A few days ago, I talked about how to "shred and store" a spreadsheet -- i.e., how to push rows of a spreadsheet into individual nodes in CRX (one node per row, with column data stored as properties). I also gave JavaScript code for doing this in an OpenOffice macro. For testing purposes, I used the CSV file a1-film.csv, representing 1741 movies catalogued by Georgia Tech's College of Computing.
After running my OpenOffice macro on the Georgia Tech CSV file, my CRX repository now contains movie data (Title, Director, Year, etc.) for 1741 films, each film with its own nt:unstructured node under the path /content/films/. In the CRX Content Explorer, a given node (in this case, the node at http://localhost:7402/content/films/terminator_2) looks something like this:
Since version 2.0 CRX comes with CRXDE Lite (CRX Development Environment - Lite), a web based tool to ease the development of CRX based applications. CRXDE Lite is implemented using the ExtJS Javascript library and aims to replace the CRX 1.x Content Explorer with a modern AJAX-based repository editor and browser, but it also provides improved means for searching, code editing and integrations for code version management and handling of non-scripted code. As a tool primarily for developers, it also comes with server side development functionalities like compilation of Java code, OSGi bundle creation and autodeployment, project wizard, etc.
In a recent blog, I talked about how easy it is to store snippets of text from OpenOffice in a CRX repository using a little bit of JavaScript and the Sling REST API. While being able to store arbitrary bits of text this way is certainly useful, it would be even more useful to be able to store spreadsheet data. Of course, storing a spreadsheet in CRX, per se, is not much of a challenge: with WebDAV, it's a matter of drag and drop. But storing an entire spreadsheet as a single monolithic content item doesn't necessarily give you the greatest content-management bang for the buck. Often, what you really want to do is granularize the spreadsheet into records (or row data), and store individual rows as content items. (You could take it further and store individual cells as content items, but that would probably be overkill for most situations, although there's certainly nothing preventing you from doing it.)
In the database world, where decisions often have to be made as to how best to decompose an XML document when mapping it to tables in a database, this general process (of decomposing a large document along the lines of its natural internal fine-structure) is known as shredding. What would be handy is to have an OpenOffice macro that could shred a spreadsheet into rows, and push the rows into nodes in CRX. That's what I propose to show you right now.
It turns out to be pretty easy to parse a spreadsheet in an OpenOffice macro. Using JavaScript:
// First, get the document object
// from the scripting context
oDoc = XSCRIPTCONTEXT.getDocument();
// Next, get the XSpreadsheetDocument
// interface from the document
xSDoc = UnoRuntime.queryInterface(XSpreadsheetDocument, oDoc);
// Then get a reference to the sheets for this doc
var sheets = xSDoc.getSheets();
// get Sheet1
var sheet1 = sheets.getByName("Sheet1");
Once you've gotten the sheet reference, you can use it to obtain a cell reference:
var cell = sheet.getObject().getCellByPosition( column, row );
The cell, in turn, contains data, which (dependening on whether you're dealing with a native OpenOffice spreadsheet versus a freshly imported CSV file) can be a floating-point value, a string, or something else. For purposes of this discussion I'm going to assume that you've just imported a CSV or tab-delimited file into OpenOffice, in which case all cells will automatically contain string data. To get the string data from a cell in a freshly imported CSV file, you have to do:
var content = cell.getFormula();
At least, that's what works in OpenOffice 3.2.
The general plan of attack, then, is to come up with a function that can parse a row's worth of data out of a spreadsheet; and have another function that can persist a row of data as a content item in CRX. Then it should be possible to create a macro that simply loops over all rows in a spreadsheet and pushes them out to the repository.
The row-parsing function is pretty straightforward:
function getRow( sheet, rownumber, startColumn, endColumn ) {
var obj = sheet.getObject();
var record = [];
for (var k = startColumn; k < endColumn ; k++) {
var cell = obj.getCellByPosition( k, rownumber );
var content = cell.getFormula();
record.push( content );
}
return record;
}
Given a reference to a Sheet, along with a row number and the starting and ending column numbers, this function loops through cells and pushes cell values into an array. The returned array represents a row's worth of data.
To persist a row to CRX, we have a function that looks like this:
function persistRow( sheet, rownumber, startColumn, endColumn ) {
// get first row of data (column names)
var columnNames = getRow( sheet, 0, startColumn, endColumn );
// get specified record
var row = getRow( sheet, rownumber, startColumn, endColumn );
// build the request
var request = {};
request[":nameHint"] = row[2]; // Title
request["sling:resourceType"] = "films";
for ( var i = 0; i < columnNames.length; i++) {
request[ columnNames[ i ] ] = row[ i ];
}
var data = createRequest( request );
// where to store it
var url = "http://localhost:7402/content/films/";
// finally, hit the repository
var response = doJavaPOST( url, data );
return response;
}
Notice that the code assumes that the first row of "data" in the spreadsheet columns the column names. This was the case with the test-spreadsheet I used for testing this macro, namely a spreadsheet called a1-film.csv, representing 1741 movies catalogued by Georgia Tech's College of Computing. Each row in the spreadsheet has information for a particular film, such as the film's title, the year the film was made, its genre, the name of the director, major actors and actresses, etc.
Without further ado, here is the complete code for the OpenOffice macro:
You'll notice that the code creates a JEditorPane window to act as an error console. When you run the macro, a JOptionPane dialog appears, asking you to supply the number of rows and columns in the spreadsheet. (For the Georgia Tech spreadsheet, you can enter "1741,8", minus quotes.) Once you dismiss the dialog, the code goes to work looping over all the rows in the spreadsheet, posting each row to CRX at a path of http://localhost:7402/content/films/.
Each new node is named according to a :nameHint parameter based on the Title of the film.
Notice also, we designate a sling:resourceType for each node of "films." (This happens in the persistRow() function.) This fact will be important in a later blog when I show how to write server-side scripts that handle various types of requests for film data.
And that's about it: Now you know how to shred a spreadsheet (say that 3 times in a row fast...) and store the results in CRX, using OpenOffice.
Lately I've been doing a fair amount of server-side scripting using ESP (ECMAScript Pages) in Sling. At first blush, such pages tend to look a lot like Java Server Pages, since they usually contain a lot of scriptlet markup, like:
<% // script code here %>
and
<%= // stuff to be evaluated here %>
So it's tempting to think ESP pages are simply some different flavor of JSP. But they're not. From what I can tell, ESP pages are just server pages that get handed to an EspReader before being served out. The EspReader, in turn, handles the interpretation of scriptlet tags and expression tags (but doesn't compile anything into a servlet). Bottom line, ESP is not JSP, and despite the availability of scriptlets tags, things work quite a bit differently in each case.
Suppose you want to detect, from an ESP page or a JSP page, what kind of browser a given page request came from. In a Sling JSP page you could do:
<%@taglib prefix="sling" uri="http://sling.apache.org/taglibs/sling/1.0" %>
<sling:defineObjects/>
<html><body>
<%
java.util.Enumeration c = request.getHeaders("User-Agent");
String s = "";
while ( c.hasMoreElements() )
s += c.nextElement();
%>
<%= s %>
</body></html>
But what do you do in ESP? Remember, <sling:defineObjects/> is not available in ESP.
It turns out that Sling automatically (without the need for any directives) exposes certain globals to the JavaScript Context at runtime, and one of them is a request object. Thus, in ESP you'd simply do:
<%
c = request.getHeaders("User-Agent");
s = "";
while ( c.hasMoreElements() )
s += c.nextElement();
%>
<%= s %>
Very similar to the JSP version.
So the next question I had was, what are the other globals that are exported into the JavaScript runtime scope by Sling? From what I can determine, the Sling globals available in ESP are:
currentNode
currentSession
log
out
reader
request
resource
response
sling
currentNode is the JCR node underlying the current resource; currentSession is what it sounds like, a reference to the current Session object; log refers to the org.slf4j.Logger; reader returns request.getReader(), which allows for reading the request body; request is a reference to the SlingHttpServletRequest; resource is the current Resource; response is, of course, a reference to the SlingHttpServletResponse; and sling is a SlingScriptHelper. All of these are available all the time, throughout the life of any ESP script in Sling.
The nice part about server-side scripting in Sling (one of many nice parts), incidentally, is that you don't have to choose to do just ESP pages or just JSP; you can write an ESP handler for one situation and a JSP for another, and use ESP/JSP in any combination. You're not locked into one technology or the other.
For more information, try the Sling Javadocs here or Day's page of resources here (note, in particular, the list of References on the right).
The current board of directors of the Apache Software Foundation has just been elected - congratulations to:
Roy and Bertrand are colleagues of mine at Day Software.
To find out more about what the board actually does have a look at "How the ASF works".
“Flexible access to people and resources can be enormously powerful in a world driven by changes that, more often than not, lead us in unanticipated directions…we need to become more adept at ‘capability leverage’ – finding and accessing complementary capabilities, wherever they reside in the world, to deliver more value.”
- From “The Power of Pull” by J Hagel, J S Brown, L DavidsonBusinesses, in particular in the Western world, are becoming more and more knowledge-intensive with an increasing part of the workforce engaged in knowledge-based work. A study by The Work Foundation has estimated that we have a 30-30-40 workforce - 30 per cent in jobs with high knowledge content, 30 per cent in jobs with some knowledge content, and 40 per cent in jobs with less knowledge content.
No fiesta
Every time an acquisition is published, it is time to celebrate. Well, it depends for whom.
I have learnt to love acquisition as a shareholder of the acquired company. I have learnt to rebuild teams, business and clients’ trust on the ruins of the acquired companies as a manager, several times. I have learnt to learn how tough, and rare, it was to acquire successfully a company as an investor, an employee or whatever, it is always a challenge.
Mission accomplished?
So I am not the kind to say “congrats” during an acquisition but to the team in charge of the acquisition, of course. They have “accomplished the mission” and anyone who had been in such business knows how tricky it can sometimes be. So yes, congrats to the management team of Day software, they have sold the company very much successfully.
But from my point of view, there is not so much to celebrate during an acquisition, especially for many of the employees and for many of the clients. The most difficult part is still to come. Acquisition does not matter so much compare to integration. It is like celebrating a deal whereas what matters is the Go Live of a project, not the signature of the contract.
Acquisition means “risky business”
OK let’s write “challenging” instead of “risky” to be positive.
I have not read today so much about how a mid size company with a strong taste of Swiss attitude and open source philosophy will fit into a mainly US centric international proprietary software company. I have also learnt to learn that during acquisitions people matter, but not so much actually. Not because you don’t want to, just because you cannot take care too much of the details and that’s what individuals are during an acquisition. Of course some individuals will find great opportunities thanks to the deal, that not the point, but just to say that an acquisition is not a fiesta for everyone, far from that: it mainly means “risky business”.
And one of the key advantages of Day has been several very gifted, skills and committed individuals. I am wondering how Adobe will handle the famous “The surprising truth about what motivates us”
Day Software was one of the rare independent high end WCM vendor and will now be just one of the products of a major software company. That’s a major evolution. Uncertainty will prevail for weeks if not months like for any similar acquisition, at least for many employees, even if, of course, Adobe may argue the opposite. Many are speculating and will speculate about the products integration, the strategic fit, the risks, the advantages, the constraints and so on. I don’t really care so much as my employer is a competitor so I have to focus at my clients, the team and to our own product but to finish my post on something funnier I cannot prevent myself motivating the shareholders of Day to agree on this acquisition. Let me be more specific:
Four Reasons to sell your shares to Adobe
1. The price is right
The share’s price of the company was extremely high before the acquisition, and I assumed the market was expecting something irrational, sorry I meant “exceptional”. And the market is so often right, so now it is really time to do something guys. Frankly I am impressed. I can write for pages why this valuation looks so unreal to me – maybe because I am coming from a different world - but frankly just sell for that price your shares. Exaggerating, I can write you are more likely to be hit by an asteroid tomorrow morning than to get a better deal soon.
2. That’s good for my business
Everybody knows an acquisition is always good for the business of the competition at least for the months to come. Beyond more than a year, as usual in this kind of business, it does not really matter so much, that’s another time frame and we always have to face competitors of many kinds: that’s business. What matters is adaptation and anticipation, not conjecture. So for the time being, I believe that’s good news for my employer’s business development as we have very frequently met Day software both in Europe and North America these last months.
3. You will give Adobe a cooler image
I really like the spirit and the values of the Day team. I don’t believe in miracles but I hope they will influence Adobe somewhere in a positive way. At least, they won’t be of any bad influence!
4. This summer is so boring like any summer, thanks for the show
For different reasons, and somewhat surprisingly, daily business is always thriving during July and August but conversely market news is usually so boring during these sunny weeks in Europe too. At last some news to really speak about during the summer break. Cool. And just before the CMS geek up cessions! Very cool.
So please give me a favor, just sell your shares to Adobe.
Further readings:
Analysis, interesting comments and some (inevitable) speculation from our WCM industry gurus: Adobe to acquire Day – First Take ECM perspective
Great critics – as usual – from Seth Gottlieb: Will Day stay committed to web standards under Adobe’s ownership?
CMS Wire article: Web CMS: Adobe Buys Day Software for US$ 240 Million
Boris-Magnolia’s own publicity but with several excellent remarks: http://www.betterfasterbigger.com/2010/07/day-to-be-acquired-by-adobe.html
Jon on tech blog post: http://jonontech.com/2010/07/28/a-fine-day-for-adobe/
Jeff Potts who was very quick to write about the acquisition: Adobe acquires Day Software for $240 million
I promised last time to show a simple way to render CRX content as PDF. The technique in question involves using a PDF form as the readymade container, into which form data is imported using XFDF. The latter is the XML version of Adobe's Forms Data Format, which in turn is a file format specifically designed to allow import and export of data to and from PDF forms.
The way it works is simple: Suppose you have a PDF form that you want to populate with data. You merely need to create a small data file (in XFDF format) and put it on the server. When a user requests the data file (which has a mimetype of "application/vnd.adobe.xfdf"), Acrobat Reader (or the Reader browser plug-in) detects the fact that form data will need to be imported into a form. The XFDF file itself contains a pointer to the actual form to be used. Reader fetches the form, then imports the form data into it, and renders the result as a PDF file containing the data. It all happens transparently to the user, and the user need only have Acrobat Reader (not a full copy of Acrobat Professional).
In the example I'm going to show below, we generate the XFDF file dynamically on the server, via a script called (what else?) xfdf.esp. We'll get to that in a minute.
The example we're going to talk about assumes that there is content in CRX (under a path of /content/films) that looks something like this:

This particular content node is named terminator_2. It lives under /content/films/ in my CRX repository.
Notice, in the above list, that there is a property (at the bottom) called sling:resourceType, set to a value of "films." This tells CRX to look under /apps/films for any scripts that might be necessary to render the content.
In previous blogs, I've shown how to write scripts that render this content as HTML, SVG, or CSV. Right now, what we need is an XFDF renderer. That turns out to be pretty easy to set up.
First, we need to create a PDF form to hold our data. In the Acrobat Professional forms editor, such a form looks like this:
I've shown how easy it is to push spreadsheet data into CRX (in such a way that there is one content node per row of data, where properties on that node correspond to column data). The reverse is also possible: It's easy to write a script that converts sibling nodes to row data formatted as CSV (comma-separated values per RFC 4180). Such a script, csv.esp, looks something like this:
The rules for escaping data for CSV are extremely simple. First, any data string that contains the double-quote (") character needs to have each such character converted to two double-quotes (""). Secondly, if the data contains a comma, the entire data string needs to be wrapped in quotation marks. The same is true for any data that contains double-quotes or line breaks (which RFC 4180 defines as CRLF -- carriage return followed by linefeed). The following very simple function enforces these escaping rules:
// Escape field data per RFC 4180
function escapeData( data ) {
// replace " with ""
data = String(data).replace( /"/g, "\"\"" );
// if data contains comma, CRLF, or "
// we need to wrap the entire thing in double quotes
var escapables = /,|(\r\n)|"/;
if ( data.match( escapables ) )
return "\"" + data + "\"";
return data;
}
The function that actually converts nodes to records is very straightforward as well:
function nodesToCSV( nodes, propertyNames ) {
var records = new Array( );
for ( var i = 0; i < nodes.length; i++ ) {
var aRecord = new Array( );
// suck in the data for each property:
for ( var k = 0; k < propertyNames.length; k++ ) {
var data = nodes[ i ][ propertyNames[ k ] ];
var escaped = escapeData( data );
aRecord.push( escaped );
}
records.push( aRecord.join( "," ) );
}
var CRLF = String.fromCharCode(13) +
String.fromCharCode(10);
return records.join( CRLF );
}
Note that we need to explicitly provide the function a list of property names, rather than (say) let the function iterate through property names on an introspective basis. The reason for this is that if we simply try gathering property names with a for/in loop, we will get back property names in no particular order. And the order will, in fact, vary from content node to content node even if all of the content nodes have properties with exactly the same names. The unorderedness of the properties (as obtained through simple iteration) would scramble the column data in our CSV file. We don't want that. Hence, we pass in an array of property names, and march through the array in orderly fashion when pulling property data from each node.
When I placed csv.esp in my repository under /apps/films and then navigated to http://localhost:7402/content/films.csv, CRX dutifully fired my script and produced a CSV file containing all of the data from my /films content nodes, causing my browser (in turn) to inform me that I was downloading a file of type "csv" (it then asked me what program I wanted to use to open the file; I specified scalc.exe, and OpenOffice dutifully loaded the file as a spreadsheet).
So far, I've shown how to render /films data as HTML, SVG, and CSV. Next time, I want to show a simple trick for rendering the data as PDF. It's easier than you think!
It's always good to get a glimpse into the approaches taken by non-OSS JCR implementations: In a recent technical article on the developerworks website Malarvizhi Kandasamy describes how IBM goes about JCR fulltext search. The actual engine is
Juru, which is a Java library developed by the IBM Haifa research lab
According to the article Juru is capable of some natural language processing like stemming or finding similar spellings.
IBM uses a JCR compliant repository in a number of their products, e.g. Lotus Web Content Management or WebSphere Portal.
A few days ago, I talked about how to "shred and store" a spreadsheet -- i.e., how to push rows of a spreadsheet into individual nodes in CRX (one node per row, with column data stored as properties). I also gave JavaScript code for doing this in an OpenOffice macro. For testing purposes, I used the CSV file a1-film.csv, representing 1741 movies catalogued by Georgia Tech's College of Computing.
After running my OpenOffice macro on the Georgia Tech CSV file, my CRX repository now contains movie data (Title, Director, Year, etc.) for 1741 films, each film with its own nt:unstructured node under the path /content/films/. In the CRX Content Explorer, a given node (in this case, the node at http://localhost:7402/content/films/terminator_2) looks something like this:
Since version 2.0 CRX comes with CRXDE Lite (CRX Development Environment - Lite), a web based tool to ease the development of CRX based applications. CRXDE Lite is implemented using the ExtJS Javascript library and aims to replace the CRX 1.x Content Explorer with a modern AJAX-based repository editor and browser, but it also provides improved means for searching, code editing and integrations for code version management and handling of non-scripted code. As a tool primarily for developers, it also comes with server side development functionalities like compilation of Java code, OSGi bundle creation and autodeployment, project wizard, etc.
In a recent blog, I talked about how easy it is to store snippets of text from OpenOffice in a CRX repository using a little bit of JavaScript and the Sling REST API. While being able to store arbitrary bits of text this way is certainly useful, it would be even more useful to be able to store spreadsheet data. Of course, storing a spreadsheet in CRX, per se, is not much of a challenge: with WebDAV, it's a matter of drag and drop. But storing an entire spreadsheet as a single monolithic content item doesn't necessarily give you the greatest content-management bang for the buck. Often, what you really want to do is granularize the spreadsheet into records (or row data), and store individual rows as content items. (You could take it further and store individual cells as content items, but that would probably be overkill for most situations, although there's certainly nothing preventing you from doing it.)
In the database world, where decisions often have to be made as to how best to decompose an XML document when mapping it to tables in a database, this general process (of decomposing a large document along the lines of its natural internal fine-structure) is known as shredding. What would be handy is to have an OpenOffice macro that could shred a spreadsheet into rows, and push the rows into nodes in CRX. That's what I propose to show you right now.
It turns out to be pretty easy to parse a spreadsheet in an OpenOffice macro. Using JavaScript:
// First, get the document object
// from the scripting context
oDoc = XSCRIPTCONTEXT.getDocument();
// Next, get the XSpreadsheetDocument
// interface from the document
xSDoc = UnoRuntime.queryInterface(XSpreadsheetDocument, oDoc);
// Then get a reference to the sheets for this doc
var sheets = xSDoc.getSheets();
// get Sheet1
var sheet1 = sheets.getByName("Sheet1");
Once you've gotten the sheet reference, you can use it to obtain a cell reference:
var cell = sheet.getObject().getCellByPosition( column, row );
The cell, in turn, contains data, which (dependening on whether you're dealing with a native OpenOffice spreadsheet versus a freshly imported CSV file) can be a floating-point value, a string, or something else. For purposes of this discussion I'm going to assume that you've just imported a CSV or tab-delimited file into OpenOffice, in which case all cells will automatically contain string data. To get the string data from a cell in a freshly imported CSV file, you have to do:
var content = cell.getFormula();
At least, that's what works in OpenOffice 3.2.
The general plan of attack, then, is to come up with a function that can parse a row's worth of data out of a spreadsheet; and have another function that can persist a row of data as a content item in CRX. Then it should be possible to create a macro that simply loops over all rows in a spreadsheet and pushes them out to the repository.
The row-parsing function is pretty straightforward:
function getRow( sheet, rownumber, startColumn, endColumn ) {
var obj = sheet.getObject();
var record = [];
for (var k = startColumn; k < endColumn ; k++) {
var cell = obj.getCellByPosition( k, rownumber );
var content = cell.getFormula();
record.push( content );
}
return record;
}
Given a reference to a Sheet, along with a row number and the starting and ending column numbers, this function loops through cells and pushes cell values into an array. The returned array represents a row's worth of data.
To persist a row to CRX, we have a function that looks like this:
function persistRow( sheet, rownumber, startColumn, endColumn ) {
// get first row of data (column names)
var columnNames = getRow( sheet, 0, startColumn, endColumn );
// get specified record
var row = getRow( sheet, rownumber, startColumn, endColumn );
// build the request
var request = {};
request[":nameHint"] = row[2]; // Title
request["sling:resourceType"] = "films";
for ( var i = 0; i < columnNames.length; i++) {
request[ columnNames[ i ] ] = row[ i ];
}
var data = createRequest( request );
// where to store it
var url = "http://localhost:7402/content/films/";
// finally, hit the repository
var response = doJavaPOST( url, data );
return response;
}
Notice that the code assumes that the first row of "data" in the spreadsheet columns the column names. This was the case with the test-spreadsheet I used for testing this macro, namely a spreadsheet called a1-film.csv, representing 1741 movies catalogued by Georgia Tech's College of Computing. Each row in the spreadsheet has information for a particular film, such as the film's title, the year the film was made, its genre, the name of the director, major actors and actresses, etc.
Without further ado, here is the complete code for the OpenOffice macro:
You'll notice that the code creates a JEditorPane window to act as an error console. When you run the macro, a JOptionPane dialog appears, asking you to supply the number of rows and columns in the spreadsheet. (For the Georgia Tech spreadsheet, you can enter "1741,8", minus quotes.) Once you dismiss the dialog, the code goes to work looping over all the rows in the spreadsheet, posting each row to CRX at a path of http://localhost:7402/content/films/.
Each new node is named according to a :nameHint parameter based on the Title of the film.
Notice also, we designate a sling:resourceType for each node of "films." (This happens in the persistRow() function.) This fact will be important in a later blog when I show how to write server-side scripts that handle various types of requests for film data.
And that's about it: Now you know how to shred a spreadsheet (say that 3 times in a row fast...) and store the results in CRX, using OpenOffice.
Lately I've been doing a fair amount of server-side scripting using ESP (ECMAScript Pages) in Sling. At first blush, such pages tend to look a lot like Java Server Pages, since they usually contain a lot of scriptlet markup, like:
<% // script code here %>
and
<%= // stuff to be evaluated here %>
So it's tempting to think ESP pages are simply some different flavor of JSP. But they're not. From what I can tell, ESP pages are just server pages that get handed to an EspReader before being served out. The EspReader, in turn, handles the interpretation of scriptlet tags and expression tags (but doesn't compile anything into a servlet). Bottom line, ESP is not JSP, and despite the availability of scriptlets tags, things work quite a bit differently in each case.
Suppose you want to detect, from an ESP page or a JSP page, what kind of browser a given page request came from. In a Sling JSP page you could do:
<%@taglib prefix="sling" uri="http://sling.apache.org/taglibs/sling/1.0" %>
<sling:defineObjects/>
<html><body>
<%
java.util.Enumeration c = request.getHeaders("User-Agent");
String s = "";
while ( c.hasMoreElements() )
s += c.nextElement();
%>
<%= s %>
</body></html>
But what do you do in ESP? Remember, <sling:defineObjects/> is not available in ESP.
It turns out that Sling automatically (without the need for any directives) exposes certain globals to the JavaScript Context at runtime, and one of them is a request object. Thus, in ESP you'd simply do:
<%
c = request.getHeaders("User-Agent");
s = "";
while ( c.hasMoreElements() )
s += c.nextElement();
%>
<%= s %>
Very similar to the JSP version.
So the next question I had was, what are the other globals that are exported into the JavaScript runtime scope by Sling? From what I can determine, the Sling globals available in ESP are:
currentNode
currentSession
log
out
reader
request
resource
response
sling
currentNode is the JCR node underlying the current resource; currentSession is what it sounds like, a reference to the current Session object; log refers to the org.slf4j.Logger; reader returns request.getReader(), which allows for reading the request body; request is a reference to the SlingHttpServletRequest; resource is the current Resource; response is, of course, a reference to the SlingHttpServletResponse; and sling is a SlingScriptHelper. All of these are available all the time, throughout the life of any ESP script in Sling.
The nice part about server-side scripting in Sling (one of many nice parts), incidentally, is that you don't have to choose to do just ESP pages or just JSP; you can write an ESP handler for one situation and a JSP for another, and use ESP/JSP in any combination. You're not locked into one technology or the other.
For more information, try the Sling Javadocs here or Day's page of resources here (note, in particular, the list of References on the right).
The current board of directors of the Apache Software Foundation has just been elected - congratulations to:
Roy and Bertrand are colleagues of mine at Day Software.
To find out more about what the board actually does have a look at "How the ASF works".
“Flexible access to people and resources can be enormously powerful in a world driven by changes that, more often than not, lead us in unanticipated directions…we need to become more adept at ‘capability leverage’ – finding and accessing complementary capabilities, wherever they reside in the world, to deliver more value.”
- From “The Power of Pull” by J Hagel, J S Brown, L DavidsonBusinesses, in particular in the Western world, are becoming more and more knowledge-intensive with an increasing part of the workforce engaged in knowledge-based work. A study by The Work Foundation has estimated that we have a 30-30-40 workforce - 30 per cent in jobs with high knowledge content, 30 per cent in jobs with some knowledge content, and 40 per cent in jobs with less knowledge content.
Grab this swicki from eurekster.com