Considerations of the file_wrangler_2 Base Class

CDFileRepresentation is the core, base class upon which file_wrangler_2 is built. For every file and folder a user of the program wants to potentially rename, one CDFileRepresentation stands in for that object. New file names are often derived from metadata of each individual file and folder of interest in a renaming session. For example, one may want the “modification date” or some sort of EXIF data inserted in the template-derived new file names. We don’t want to hit the filesystem repeatedly to request this information for 30,000 files over and over and over again. However, we need the information and CDFileRepresentation objects will cache this data for us.

The CDFileRepresentation class has to be able to instantiate itself quickly, providing the “most likely useful” information to the user immediately, but also needs to minimize its hits to the filesystem when the user requests certain types of data. Apple provides numerous ways of obtaining file metadata, including Cocoa’s NSFileManager’s attributesOfFileAtPath:error: , Core Foundation’s LSCopyItemInfoForRef(), and the Spotlight metadata via MDItemCreate().

I’ve been quite surprised at the amount of overlap in the data obtainable through these various methods, and also a touch disappointed that one or two items of interest (like, checking of an item is an Application or not) are excluded from some methods, but not others and so on. Basically, there doesn’t seem to be the “one true way” to obtain the information about a file that matches the user’s mental model of what is going on.

So, this means hitting the filesystem in at least two different ways to obtain all information that is of interest. An implementation pattern that Apple encourages is that of “lazy loading.” Basically it just means “don’t do the work until you’re asked for the information” and it has proven to be very successful in the design of the CDFileRepresentation class. Before implementing lazy loading, in initial testing under semi-idealized conditions, 1774 CDFileRepresentation objects took 2.7 seconds to instantiate.

After looking at the types of data a CDFileRepresentation needs to be able to model for the user, some ivars could be grouped by “most efficient method of extraction”. By breaking the code apart and extracting like-data upon request of any any arbitrary piece of data (creation date, for example), we can control how much time the user spends waiting for any given request. This has the nice effect of amortizing wait time over the length of any given file_wrangler_2 session. This also helps us to never load certain types of data the user may never request.

After these and other optimizations (for example, switching to the C struct LSItemInfoRecord to obtain certain booleans) I was able to reduce instantiation time up to 16x. It now takes 0.17 seconds to instantiate the 1774 objects and an additional (one-time only filesystem hit) of an additional 0.19 seconds to obtain the full metadata for those objects (additional requests are completed in about 0.09 seconds).

I will, of course, continue to research ways to reduce these times even more, but there will be a limit to what can be done and still provide the functionality everyone needs. This all begs the question, “How fast does it need to be?” and I would like to tackle that in a future post when I have more built around this class.

Leave a Reply

Your email address will not be published. Required fields are marked *