Category Archives: Code

Apple needs to rein it back in

We don’t need major OS releases every year. We don’t need each OS release to have a huge list of new features. We need our computers, phones, and tablets to work well first so we can enjoy new features released at a healthy, gradual, sustainable pace. Apple has lost the functional high ground

Hear hear. I love Mac OS because it has a style and feature set that just “clicks” with the way I think. Yosemite looks cool and all, but it doesn’t work nearly as smoothly as past iterations of Mac OS. I have things to do for both work and school. I would prefer an OS that is simple and reasonably bug-free over some new, arguably flashy feature or tool. One could make the argument that I just move to Unix, but then the “click” is lost. Apple needs to get its collective head back on straight and get us back to a paradigm where upgrades prioritize stability over other considerations.

Processing 1,200 DocBook XML Files

XML gets a bad rap, and I am not going to debate its merits here. That has been done ad nauseum over the years, and still no one has a better alternative for when it comes to tagging anything beyond the simplest books. Yes, there is JSON, but as soon as a title has a piece of art or an index I need linked, JSON is no longer the best tool. I have my own opinions about it, but the fact remains that XML is the lingua franca for tagged content in my industry, and I don’t have so many issues with it that I feel compelled to propose my own fixes for it.

The tools that exist to work with XML, however, I think speak to XML’s bad rap as much as anything. There is no single “go to” for all XML work. By that I mean, whereas one can use Eclipse all day to code C, Java, or use Xcode for Objective-C, C, there is no single app that does everything. In addition, the two provided technologies that exist for manipulating XML content—XSL and XQuery—feel as though they were developed from two completely different directions and simply thrown into the XML package.

My chosen tools and their uses

Diagnose and Repair

oXygen XML Editor
I forget what landed me on oXygen’s doorstep, but when manually inspecting an XML file for the first time, this is my “go to.” The errors that are returned from parsers like Xerxes can be intensely cryptic, but oXygen provides a useful interface to make drilling down to certain errors relatively straightforward. XML-specific editing hooks like automagically closing tags and validating against the declared DTD make working with XML worth the purchase price. There are times, however, when some UTF-8 encoding issue prevents even opening the file, at which point I move on to Plan B: xmllint.

xmllint goes where oXygen fears to tread. If I have any error that prevents oXygen from opening a file for any reason, xmllint will tell me exactly where that error is. I’d like to think that an editor as robust as oXygen could handle the same functionality as xmllint, but it doesn’t. I don’t use the command for anything except the simplest of edits (no need to torture myself with vim or emacs if I don’t need to), so I then move onto the next tool for the fix: BBEdit.

BBEdit is the stuff of legend on the Mac, and I don’t think I need to sing its praises to the choir here. While it doesn’t have the XML-specific hooks of oXygen, it does open those files that oXygen barfs on, and has killer search and replace features for fixing problems. One of the best parts of BBEdit is that even if it does come across a UTF-8 encoding issue, it will open the file anyway, which means I can make the fix and move on to transformations.


Updating an XML document’s structure is inevitable when prepping content. XML offers XSL, but I rarely work in a vacuum. Typically, I am mashing some content with some other content, and for that, I need to be able to manage a content store which XSL doesn’t allow on its own. Enter Cocoa.

I hate to say it, because other developers might (will) cringe, but I use Xcode as a as a deep, rich scripting platform as I do for making an application. If I need to deploy a tool for my team, I can do so in no time, but more often than not, I am the one developing and executing the solution. I’ve developed a couple of strategies around this.

First, all solutions begin as command line applications. XML work almost never needs an interface, so I develop all the logic in controllers that link to the main function. If I need an interface, then adding one is a cinch coming from a command line app (but not the other way around).

Second, I develop with scalability in mind. If I do something with one file, chances are very high that I will need to do the same to other files as well. I have developed over time a class called OCFileParser that is the bridge between the directory system and editing logic.

The benefit to all of this is Using Cocoa’s NSXMLParser and NSXMLDocument classes makes working with XML incredibly flexible and fast. XSLT has its uses, but it doesn’t have the same hooks as full-fledged programming language.

The One Big Problem with those classes, however, is that they can crash with EXC_BAD_ACCESS on well-formed, valid XML. Out of the 1,200 titles I am working, there’s around 90 that exhibit this behavior, and they are a real mystery. Everything else in the toolbox has no problem with them but the NSXMLDocument class just barfs on them. I am still trying to sort out if there is some bug deep in the bowels of the classes (others exist so this is entirely in the realm of possibility); if how they link to external files is a problem; or if this is a memory issue—1,200 XML documents is a lot especially since I am relying on ARC for garbage collection to speed development.

That’s the setup. I have one or two other apps I have to work with, but I save those for when I get truly desperate, and not really worth mentioning here (though I am getting close given that last bit). I have a couple more blog posts on how the whole thing works in practice I am looking to get posted before the next semester begins.

Microsoft vs. LaTeX

Ed: This WordPress theme makes the titles all-caps, thus mangling “LaTeX.” My analytics should get interesting in a little while.

I haven’t read this entire article yet, but the opening paragraph has the best comparison of Word and LaTeX I’ve seen yet:

Microsoft Word is based on a principle called “What you see is what you get” (WYSIWYG), which means that the user immediately sees the document on the screen as it will appear on the printed page. LaTeX, in contrast, embodies the principle of “What you get is what you mean” (WYGIWYM), which implies that the document is not directly displayed on the screen and changes, such as format settings, are not immediately visible. An Efficiency Comparison of Document Preparation Systems Used in Academic Research and Development

Between work and school, I deal with Word and LaTeX a lot. LaTeX less so than Word given Word’s ease of use for everyone, but I work with enough math content at work that I needed to learn at least the basics. But, once I got the hang of LaTeX, I’ve been using that as my “go to” for document preparation, despite the state of LaTeX to be a lot more crunchy than I think it needs to be (that’s a separate blog post entirely). Still, even after using LaTeX consistently for a few years, I find it hard to explain it to someone who hasn’t so much as even seen it.

This bit in the abstract is interesting as well:

We show that LaTeX users were slower than Word users, wrote less text in the same amount of time, and produced more typesetting, orthographical, grammatical, and formatting errors. On most measures, expert LaTeX users performed even worse than novice Word users. LaTeX users, however, more often report enjoying using their respective software. We conclude that even experienced LaTeX users may suffer a loss in productivity when LaTeX is used, relative to other document preparation systems.

I really need to read the article to find why this to be true but two things come to mind immediately:

  • Know your tools. If LaTeX is a core requirement for submissions, then take the time to really learn it.
  • Always double-check your work. There are no excuses for not checking work before submission.

One of the weird quirks of Applescript

I am prepping my company’s XML archive for uploading into MarkLogic (I know, English, right? I’ll post more about this in the near future). But, within the archive is a bunch of PDF, EPUB, and image files I don’t want, around 30,000 or so files that need deleting. (There’s about 1,400 XML files. I’ll post more about that in the near future.)

I’m using Applescript to crawl through the folder hierarchy and delete those files I don’t want and I came across this weird bug. It turns out this fails at some point of burgeoning memory usage fails, not even try...catch works:

tell application "Finder" to delete target_file

But this works as expected, complete with try...catch, regardless of the environment:

tell application "Finder"
delete target_file
end tell

Hacker’s Delight

I have been musing lately about the great disservice Apple did the world by making computers easy to learn — namely the fact that few people ever bother to learn about them. Who bothers to learn about them when, on the iPhone for instance, the case is sealed shut, the lifespan is 1 or 2 years for many purchasers, and the platform is closed in lots of ways? My boys love 1986 computing

I hadn’t thought of the Mac that way before. I don’t know that I would go so far as to say that they have done everyone a “disservice,” but developing for Apple is not nearly as open or cheap as is developing for Android. Weighing the relative benefits of each is another discussion entirely.

To the point of the article, hacking away on some of my old Macs is something that I am looking forward to after I finish school. Back in the day, when I was plunking along on my Commodore 64, I was either playing cracked games or doing my homework in GEOS. GEOS was what got me to truly realize the computer’s potential, where I did a bunch of papers and art for classes. Even after the requisite computer classes, all I really walked away with was the ability to LOAD "*",8,1 enough to get to games and GEOS.

But now that I am wrapping up my computer science curriculum at school, and mopping up the last of my non-credit requirements, I am looking forward to booting up the original iMac in OS 8 I have sitting downstairs. I still have loads of old software and seeing some of what I missed using it the first time. I’m a different user than I was so many years ago. Sharing all that with the kids is a pleasant bonus.

Sanitizing Strings with NSCharacterSet and NSScanner

Funny how, after all these years, I hadn’t needed to clean a string of an arbitrary set of unwanted characters on a large scale, but I am doing a ton more XML work these days, so it was bound to happen. I don’t remember where I stumbled upon the idea to use NSScanner, but once I saw it, it made perfect sense. I think there’s some excess baggage between the buffer string and the fact that the string is scanned anywhere between O(n)O(n2) (if not worse, actually). But, this is certainly a more elegant solution than doing all the heavy lifting myself between the set and the string.

- (NSString *)sanitizeString:(NSString *)str withSet:(NSCharacterSet *)set {
	NSScanner *scanner = [NSScanner scannerWithString:str];
	NSString *buffer;
	while ( [scanner scanCharactersFromSet:set intoString:&buffer] ) {
		NSRange range = [str rangeOfCharacterFromSet:set];
		if ( range.location != NSNotFound ) {
			str = [str stringByReplacingCharactersInRange:range withString:@""];
	return str;

To use:

NSString *string = "A string that might have funny characters in it.";
NSCharacterSet *set = [NSCharacterSet illegalCharacterSet]; 
NSString *result = [foo sanitizeString:string withSet:set];

Github Student Developer Pack

There’s no substitute for hands-on experience, but for most students, real world tools can be cost prohibitive. That’s why we created the GitHub Student Developer Pack with some of our partners and friends: to give students free access to the best developer tools in one place so they can learn by doing.
Github Student Developer Pack

The list of software and subscription offers is truly impressive. I won’t use half the stuff, but others like Atom, I’ve been wanting to try, but haven’t for precisely the reason they give.

About that “conversational tone”…

I find the conversational tone sometimes found in entry-level technical documentation to be maddening at times. To wit…

[insert technical information]

Now because we are all friends here I’m going to share a secret with you. Come closer. Get in here. Let’s have a huddle. [insert non-intuitive technical information]


Now don’t get too worked up. [insert follow-up technical information]
Laravel: Code Bright

Clearly, I am not the target audience, but I think this is beyond necessary. I would speculate that anyone who is reading about a PHP framework has enough experience with technical material they don’t need their hand held like this. This amount of conversational tone makes using this material as a reference challenging later. There are ways to be conversational without being wasteful of everyone’s time and obstructive. I don’t see this very often where I roam around, but when I do, I cringe for the author every time I read it.

The same goes with including “um,” “hrm,” “erm,” and “well, let’s see…” and language devices of their ilk anywhere outside of an interview transcription. Just write the damn passage already.

A Dream Come True

The JavaScript OSA component implements JavaScript for Automation. The component can be used from Script Editor, the global Script Menu, in the Run JavaScript Automator Action, applets/droplets, the osascript command-line tool, the NSUserScriptTask API, and everywhere else other OSA components, such as AppleScript, can be used. This includes Mail Rules, Folder Actions, Address Book Plugins, Calendar Alarms, and Message Triggers.
Apple: Javascript for Automation Release Notes

This is a dream come true for me; really heady stuff. Applescript has been foundational to my career, but I never once—not once—liked the syntax nor the environment. Giving Javascript a first-class implementation could be very beneficial, I think, as there are a hell of a lot more Javascript developers than there are Applescript developers. The problem still exists with wonky scripting support in applications (I’m looking at you, Adobe with your fancy-pants JSX). Perhaps by removing the Applescript barrier to automation will bring some new talent into this niche area that has been too specialized for its own good. Up until now, the Yosemite update was pretty “meh,” but now I’m excited.

What do you care, Excel?

While doing some automation work, I unintentionally tried to open two files with the same name but from different directories in Excel. Then, this happened:


I don’t think this is something I’ve ever done in the past, but I also don’t understand why this would be a problem. If the complete paths were the same, I would understand because Mac OS treats all network volumes as /Volumes/path/to/file.ext. Statistically, small but not to the point of being pragmatically impossible, two files on two external volumes could have the same path. But going by just the name seems…short-sighted…or something. Weird.