Home   Up

A commercial database application I worked on
1999–2002

intro     windows     database     Perl scripts     formatting     summary
Introduction In 1999, I was brought into a software project that had started about a month earlier. It was a read-only Windows desktop database application written in C++/MFC1, with text data files processed by Perl scripts.

From 1999 through 2002, I put the equivalent of about 11/2 years of fulltime work into the project, and eventually became the sole/primary2 developer.
Windows I created a well-organized inheritance hierarchy for the windows in which the user selected database query parameters, and another hierarchy for several graph styles to display the data. I don't remember all the details at this point (April 2007), but I may also have created an overall base class for all the windows.

Selection windows shared a background color, a large number of functions to create, configure, and display standardized widgets, and code to retrieve selections and transfer them to the selections object.

The graphs shared code to define standard colors, dispay a color key, define standard scaling factors and automatically scale the graph, and display axis labels and the graph title.
Database One of the things I was most worried about was the flow of information through the program. Associations had to be made correctly from the button labels visible to the user, to the internal button objects, to the global data object that stored the selections, to the database code, to the graph data object, and finally to the graph. In order to avoid mixups along the way, I defined a set of enumerations for the data values, all of which were discrete. At each step, all data being passed through the program was clearly labeled by descriptive names according to its meaning. It was easy to avoid confusion that way. Some errors were automatically caught by the compiler, and if there was a bug somewhere, it was easy to read through each phase of the data-processing code to find it.

I've learned recently that there are better ways3 of solving that problem, but I didn't know about them at the time. The only alternative being suggested by the other developers was the same architecture except with character strings (no compile-time error checking) and meaningless alpha-numeric codes (hard to read the code).

There was no dynamic memory management at all in my code. None of the database fields had more than about 30 possible values, all of which were small integers, so they could be packed into fixed char arrays. The graphs supported separation of the data according to values of up to two fields, so the graph data object was a 2D char array big enough to hold any combination of values from any two fields. It took up about 1KB of memory, and was very easy to work with.

To store user-selected values of individual database fields for use by the database code, I defined a small template class that included a fixed array (whose length was a template argument), a push function, a get function, and a clear function. Selections made in the selection windows were pushed onto the object, wich was contained in the overall selections object, and read by the database code. Then the array was cleared before the next query. It basically operated like a reusable grocery list.

The database was Raimadb, an embedded network database system based on C with C++ wrappers. It was selected by another developer, who also wrote the Raima code to define the records and extract individual records from the database. Since this was a network database system instead of a relational system, I had to write all the code that iterated through the records and gathered and processed data from them before sending it on to the graph.

The most notable section of that code was a certain looping construction that had to be called in two different places, in combination with two different kinds of other (inner or outer) loops, in exactly the same way. Instead of trying to maintain two copies of that particular looping code, I defined a macro with a VERY_LONG_DESCRIPTIVE_LOOP_NAME and used that. It was never a problem.

I also worked out the database design. All of the database fields were compact (one byte), and any and all of the fields could potentially be used in selecting records. This meant that partitioning the records horizontally didn't make sense. I was interested in vertically partitioning the databases into smaller pieces for better performance, but didn't have enough time. In the end, I just used a single monolithic record type for each dataset. Searches were fast even with the simple design.
Perl scripts I did about half of the work on the Perl scripts. First, another programmer hard-coded a single prototype script for one of the several datasets in the application. Then I abstracted the script into three scripts that automated the process of reading and crunching the data for all the datasets. One script searched each header line to find the location of each column name, and a second script processed the data for each dataset, which had to be massaged somewhat because it wasn't in quite the form that we needed. The header script would read the processing script to find out what names it was supposed to look for. The third script was the entry point, which coordinated processing of all the datasets in one run, and was easily extendable. New datasets could be added in the future with virtually no effort except specifying whatever fields or calculations differed from the existing ones. The coordinator script ran the other two scripts once for each dataset, managing input and output files as appropriate. The part about the header script reading the processing script was, of course, a quick-&-dirty hack, but it worked, and I had other things to do. Later, the other programmer fixed my script-reading-a-script hack by putting the domain info into a config file and making the Perl scripts read it.
Formatting While working on this project, I developed a very visual formatting style that I've used universally ever since. One of the few places I don't mind code duplication is where a small chunk of code is repeated two or three times with no other code in between. Enough copies to warrant attention, but not enough to bother with wrapping the code in a function or macro. In that case, I like to string each chunk out on one line and stack the lines. Each line is padded with whitespace as needed to make the lines line up, so the code looks like a table. This format sacrifices some syntactic information in exchange for semantic information and compactness.

I used this style extensively, and found it very helpful in avoiding bugs, since it makes many of them readily visible. It also reduced the line counts of the many repetitive window files, each of which had a large number of buttons and other widgets being created by the same functions. The program came out to about 40,000 lines, although, if it had been written with less attention to code duplication, I suspect it might have been closer to 100,000 lines.
Summary The program was eventually finished with no outstanding bugs as far as anyone involved could tell, with all features requested by the client included.

The program design was solid but not exotic. The C++ code used templates and macros extensively, but had no dynamic memory management at all. The database was a single, monolithic set of complete records for each dataset. The project was completed not because of fancy computational acrobatics, but by relentless application of the KISS principle.

One general feature of the program, and of all my programming ever since, was a close adherence to the DRY4 principle. There was very little duplication anywhere in the program.
Footnotes 1 Microsoft Foundation Classes (MFC) were used for the GUI.
2 Several other people worked on non-programming aspects of the program, including testing and tooltip .doc files.
3 See Refactoring, by Martin Fowler. I think the refactoring is Replace Data Value with Object.
4 Don't Repeat Yourself. See The Pragmatic Programmer.


© 2007 Dan Bensen   Home   About  Site map