

|
|
Simplify Searching with Multiple File ExtensionsAuthored April 2000 Search worms (file system explorers like Verity and Index Server) generally use file extension and folder to determine which files to include within an index. So, for example, we could instruct Verity to create a collection out of all the ".HTM" and ".CFM" files in the "/content" folder. Although this works very well for content heavy sites it begins to fail for application-like sites. There are always many files that you don't want indexed: functional includes like shopping carts and database updates pages, custom tags, and so on. A solution using ExtensionsThis problem as most often been addressed through use of directory segmentation: placing "non-searchable" files in a separate directory often called "Scripts". This can sometimes cause issues with portability and maintenance since files that work together are not kept together.
For example purposes let's create a new ColdFusion extension: "CFMS" ("CFM System"). "CFMS" will be used on "code only" templates such as form and calculator results, files that make up multi-template "wizards", shopping cart templates, and so on. Basically any file that would be confusing as a search result. Note that the actual extension used is up to you. We prefer appending a single letter to "CFM", but the extension could be "Fred" just as easily. Also you may create as many extensions as make sense to you. Adding the Extension
You can now create files with the new extension and they will be parsed by the ColdFusion engine. The ProsOnce configured properly there is no functional difference between your new extension and the default extension(s). ColdFusion will parse "Index.CFMS" just as it would "Index.CFM". The main benefit is that it's now a very easy task to "hide" these files from the Verity Search Engine or Index Server: simply don't include "CFMS" in the searchable extensions list. No fuss no muss and, most importantly, no loss in performance.
Applying this technique to certain application architectures can be very useful. Many developer's, for example, use common prefixes to denote specific file types. Prepending "DB_" to files that do only database access is one example. Such prefixes can be replaced with custom extensions to gain a bit more flexibility. The ConsAlthough not technically a con it should be obvious that a site could easily be architected in such a way as to make this technique unneeded. It should also be fairly clear that retrofitting a site with this technique may be more trouble than it's worth. This technique makes sense for small sites or new development more than anything else. However a largish site that's running into problems creating a workable search engine may find this technique easier than a complete rebuild. The biggest potential problem with this technique is an obvious lack of portability. The non-standard extensions mean confusion when distributing your code outside your organization. Even in a smooth transfer there is still added server configuration time, at least initially, to handle the new extensions. ConclusionAlthough not for everybody this technique can greatly simplify searchable sites with high code content. Unlike other possible solutions (strict code segmentation or advanced search logic as two examples) there is no performance or development time loss. |