XML

General Notes

Most of my XML work with InDesign happened around CS2 before there were the convenience methods that exist in CS4, and I simply ported up from there and I’ve yet to get a chance to update my code. I’m leaving these snippets here in case the convenience methods don’t fulfill the needs given here.

“get every” vs. listing items

The “get every” method of getting objects may be faster when it comes to walking through the hierarchy, but it’s not entirely useful because the “get every” method only works on at the level in the hierarchy we’re sitting on. It doesn’t automatically go into nested items like the Finder. In other words, “get every” does not return nested nodes. But other aspects of the API, however, do give back the nested elements. It’s not that Applescript is wonky, but instead developers implementations.

The XML to Layout Bridge

I think one of the best hidden features of InDesign Applescript APIs are the ability to traverse the object hierarchy from an XML Element to the actual layout. This can be very, very powerful, but the path to get there is a bit convoluted…

  • XML Element
  • Insertion Point (Every character in a tag, including nested content, has in insertion point, plus possibly an insertion point for the tag itself).
  • Parent Text Frame
  • Parent (in this case: Page)
  • Name (the page number as string. This is dynamic when the document placed in a book file and it’s allowed to automatically number pages).

Entities (Dingbats)

The following XML shows how InDesign handles Entities:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE RootTag [
<!ELEMENT RootTag (ChildTag) >
<!ELEMENT ChildTag (#PCDATA) >
<!ENTITY ListBullet "•">
]>

<RootTag>
    <ChildTag>&ListBullet;</ChildTag>
    <ChildTag>•</ChildTag> <!-- ENTITY NOT FOUND -->
	<ChildTag>•</ChildTag>
</RootTag>
  • <!ELEMENT ChildTag (#PCDATA) > must be #PCDATA and not#CDATA
  • In <!ENTITY ListBullet "•"> must show the character itself, and not an HTML entity. <!ENTITY ListBullet •> and <!ENTITY ListBullet "•"> do not work.

Travel XML Hierarchy

The following code shows how to travel down the XML of a document and return the contents of a tag. It first gets to the root of the XML, in this example called “story”, and then goes one element deeper to “book”. It’s important to note here that the same commands can be performed on the XML elements as well, but we want to use XML Items instead because XML Items drill down to XML Elements and not vice versa. In addition, XML Item is the ”de facto” parent class to all XML classes in InDesign.

All of the tag contents get returned as plain text, but the nested tags are returned as well as “” and the replace_chars function takes those out incredibly fast.

property XMLItemList : {}
global documentObject

tell application "Adobe InDesign CS2"
	set documentObject to object reference of document 1
	set documentObjectElement to associated XML element of documentObject
	set StoryElement to GetXMLElement(documentObjectElement, "story") of me
	if StoryElement is null then return

	set BookElement to GetXMLElement(StoryElement, "book") of me
	if BookElement is null then return

	set BookElementContents to contents of contents of BookElement as string
	set BookElementContents to replace_chars(BookElementContents, "", "") of me
end tell

return BookElementContents

on GetXMLElement(ParentElement, ChildElementName)
	tell application "Adobe InDesign CS2"
		tell documentObject
			set ParentElementObject to object reference of ParentElement
			tell ParentElementObject
				set ElementCount to (get count of XML Items)
				repeat with x from 1 to ElementCount
					set ChildElement to XML item x --Here's where we mean by uising XML items as opposed to XML elements
					set ChildElement to object reference of ChildElement
					if class of ChildElement is not DTD then
						set ChildElementMarkupTag to markup tag of ChildElement
						if name of ChildElementMarkupTag is ChildElementName then
							return ChildElement
						end if
					end if
				end repeat
			end tell
		end tell
	end tell
	return null
end GetXMLElement

on replace_chars(this_text, search_string, replacement_string)
	set AppleScript's text item delimiters to the search_string
	set the item_list to every text item of this_text
	set AppleScript's text item delimiters to the replacement_string
	set this_text to the item_list as string
	set AppleScript's text item delimiters to ""
	return this_text
end replace_chars

Placing Tagged Content

Use the above script, but when placing the text, don’t return the string, just use the desired XML Element object.

tell document 1
	place XML BookElement using text frame 1
end tell

Good generic InDesign XML Applescript code

ScanHierarchy

This goes through the hierarchy recursively, but doesn’t return anything; this just travels the depth of the document. But it is one way outside of XQL and XPATH to get needed content and objects.

on ScanHierarchy(theElement)
	set theList to {}
	tell application "Adobe InDesign CS2"
		tell theElement
			set theList to (get every XML element)
			repeat with n from 1 to number of items in theList
				set theItem to item n of theList
				--insert custom code to retrieve data here
				if name of markup tag of theItem is in {"section_title"} then
					set theSectionTitle to text of theItem
					set PageNo to GetPageNumber(theItem) of me
					set end of TOC to {title:theSectionTitle, page:PageNo, level:1}

				else if name of markup tag of theItem is in {"sub_section_title"} then
					set theSubSectionTitle to text of theItem
					set PageNo to GetPageNumber(theItem) of me
					set end of TOC to {title:theSubSectionTitle, page:PageNo, level:2}
				else if name of markup tag of theItem is in {"book_title"} then
					set BookTitleArticleValue to GetTextOfChild(theItem, "book_title_article") of me
					set BookTitleTitleValue to GetTextOfChild(theItem, "book_title_title") of me
					set BookTitleSubtitleValue to GetTextOfChild(theItem, "book_title_subtitle") of me
					set BookTitleEditionValue to GetTextOfChild(theItem, "book_title_edition") of me
					set BookTitleVersionValue to GetTextOfChild(theItem, "book_title_version") of me
					set PageNo to GetPageNumber(theItem) of me

					set end of TitleIndex to {BookTitleArticle:BookTitleArticleValue, BookTitleTitle:BookTitleTitleValue, BookTitleSubtitle:BookTitleSubtitleValue, BookTitleEdition:BookTitleEditionValue, BookTitleVersion:BookTitleVersionValue, page:PageNo} ¬

				else if name of markup tag of theItem is in {"author_name", "author_org"} then
					if name of markup tag of theItem is "author_name" then
						--we need to keep the author components seperate; AuthorIndex
						set authorName to GetAuthorName(theItem) of me
						set PageNo to GetPageNumber(theItem) of me
						set end of AuthorIndex to {firstName:(item 1 of authorName), MiddleInitial:(item 2 of authorName), lastName:(item 3 of authorName), page:PageNo}
					else if name of markup tag of theItem is "author_org" then

					end if
				end if
				--end custom code
				my ScanHierarchy(theItem)
			end repeat
		end tell
	end tell
end ScanHierarchy

GetPageNumber: An example of getting from the XML to the layout

Here’s where we cross the all important bridge from the XML to the layout

on GetPageNumber(theElement)
	tell application "Adobe InDesign CS2"
		set theElementObj to object reference of theElement
		set theInsertionPointCount to (get count of insertion points in theElementObj) --each character has an insertion point.
		if theInsertionPointCount > 0 then
			set theInsertionPoint to insertion point 1 of theElementObj --we only care about the first insertion point
			set theInsertionPointObj to object reference of theInsertionPoint
			set theParentTextFrame to item 1 of parent text frames of theInsertionPointObj --note the use of "item" and not "parent text frame"; now we just go up the hierarchy of the objects in the layout
			set thePage to parent of theParentTextFrame --oddly enough, we don't make an object here; we get a "variable [x] not defined" error
			return name of thePage
		else
			return ""
		end if
	end tell
	return ""
end GetPageNumber

GetRootTag

This returns the all-important root tag of a document, if there is one. Else it returns null.

on GetRootTag(theDocument)
	tell application "Adobe InDesign CS2"
		set documentObject to object reference of theDocument
		set documentObjectElement to associated XML element of documentObject
		tell documentObjectElement
			set ElementList to (get every XML element)
			repeat with x from 1 to count of items in ElementList
				set ChildElement to item x of ElementList
				set ChildElement to object reference of ChildElement
				if class of ChildElement is not DTD then
					return ChildElement
				end if
			end repeat
		end tell
	end tell
	return null
end GetRootTag

PlaceXMLItemUsingPageItem(theXMLItem, thePageItem)

This little bit of text is bit of a rehash of code that can work just about anywhere. It’s more than a little redundant, but because the code contained in it is used everywhere, at least here if it needs to be modified, I only have to modify in one place. I tend to write Applescript that’s very subroutine-oriented because I find that it makes it easier to read, is a lot more portable, but is mainly easier to maintain in the long run.

on PlaceXMLItemUsingPageItem(theXMLItem, thePageItem)
	tell application "Adobe InDesign CS2"
		place XML theXMLItem using thePageItem
	end tell
end PlaceXMLItemUsingPageItem