A software maintenance tale gone bad
A long time ago…
In 2011 I was hired by an e-shop owner to develop an addon for his e-shop. The software that powers his e-shop is a proprietary and well established program, we will refer to it as the Cart. It is not necessary to tell you names, companies and brands in order to follow the story. The addon’s purpose was to update the e-shop with data collected from his vendors catalogs in csv or xml format and in some cases to scrap data from their sites in order to fill information needed to present the products in the shop nicely. The goal was to automate the process of updating stock and prices to existing products and create new arrivals from the vendors.
The addon I created back then was importing data from three or four vendor feeds and using photos and technical specifications provided by some of them to enrich the presentation in the shop. The result was around 3kLOC object oriented code. The raw design was to have core component (named PLUpdater) that is doing the heavy lifting and inject to it vendor specific file loader and price mark up classes. A loader and markuper for the wanted vendor are instantiated and injected to PLUpdater. PLUpdater calls loader’s load method and get a collection (a PHP array) of ProductRow objects containing unified and cleared data from vendor files. Afterward iterates the collection and updates the database with product information. Markuper is used to markup the prices for the given vendor. Vendor feeds contain bulk prices, so they are transformed to retail prices. In addition, in each category shop administrator could define pricing rules for each vendor, based on buy price. In order to speed up the price calculation I had implemented a caching mechanism that cached rules from the database to a PHP class for each vendor. So PLUpdater was using a factory method to instantiate the class for given vendor.
When updating a shop from various vendors a problem that rises is duplicate products. Vendors do not usually use the same product codes when they sell a specific product. That means if Vendors A and B sell the device MyDevice 443 from the manufacturer ManufX with product code MD433, in their feeds the product will have their internal product code, ie X-443 and P-343-34. So retailer wants to have in their store only one instance of the product, hence it is the same product. At that time a product entity for each vendor was created in the database and applied simple rule so cheapest should be the only active. The rules at that point were to match products by EAN and activate the one that has the lower price. That is, if product MD433 is available from vendor A at $10 and from vendor at $12 the product “owned” by vendor A would be active and the product owned by vendor B would by hidden.
Four years and four or maybe more developers later I was called again to do a new extension to the add-on. At the present point, there were applied many business rules on product updates, vendor feeds are 18 and a third party service was employed to pull product information. Business rules for product filtering or data sanitation were global, applied to all vendors and vendor wise. The problem is that they were spread around the code, any place were the developer supporting the e-shop felt that there was “room” for few more lines.
Also the project was in the second generation of product merging. Instead of creating multiple entities for a product, one for each vendor. An extra database table was added to hold product per vendor specific information, such availability and pricing. A better solution than the original and worked perfectly. The implementation gave me trouble. After studding the code, I found out that there was a first attempt for product merging build from developer X and the currently working attempt build from developer Y. Of-course the fall back plan if the second attempt wasn’t making shop owner happy, was to conditionally select the method for product merging. That cant be bad if the code isn’t spread around 10KLOC of code and merged in huge methods and numerous functions.
Metrics and code inspection
In the next table we present basic metrics of initial and final version of the code using a simple tool Bergman’s phploc. Only core files are included, no vendor specific code.
Initial | Final | |
Size | ||
Lines of Code (LOC) | 1318 | 5510 |
Comment Lines of Code (CLOC) | 141 (10.70%) | 1054 (19.13%) |
Non-Comment Lines of Code (NCLOC) | 1177 (89.30%) | 4456 (80.87%) |
Logical Lines of Code (LLOC) | 426 (32.32%) | 1678 (30.45%) |
Classes | 410 (96.24%) | 1560 (92.97%) |
Average Class Length | 27 | 60 |
Minimum Class Length | 3 | 0 |
Maximum Class Length | 104 | 467 |
Average Method Length | 3 | 6 |
Minimum Method Length | 0 | 0 |
Maximum Method Length | 23 | 72 |
Functions | 3 (0.70%) | 18 (1.07%) |
Average Function Length | 1 | 3 |
Not in classes or functions | 13 (3.05%) | 100 (5.96%) |
Cyclomatic Complexity | ||
Average Complexity per LLOC | 0,17 | 0,41 |
Average Complexity per Class | 5,8 | 27,38 |
Minimum Class Complexity | 1 | 1 |
Maximum Class Complexity | 14 | 248 |
Average Complexity per Method | 1,61 | 4,11 |
Minimum Method Complexity | 1 | 1 |
Maximum Method Complexity | 7 | 106 |
Structure | ||
Classes | 15 | 26 |
Abstract Classes | 2 (13.33%) | 2 (7.69%) |
Concrete Classes | 13 (86.67%) | 24 (92.31%) |
Methods | 121 | 219 |
Scope | ||
Non-Static Methods | 114 (94.21%) | 206 (94.06%) |
Static Methods | 7 (5.79%) | 13 (5.94%) |
Visibility | ||
Public Methods | 89 (73.55%) | 171 (78.08%) |
Non-Public Methods | 32 (26.45%) | 48 (21.92%) |
Functions | 2 | 5 |
Lets try to understand the numbers. LOC grew about 3 times, expected due to new features added. But so did average class and method size! Combine this with CCN, average per class was 5 and grew up to 27. There were 15 files in initial version and 27 in final. Those imply that new code added to existing classes and methods. After a simple ispection like this, it is clear that code quality was the last worry of the developers. The core class was 204LOC and CCN 14 and finally it was 1032 LOC and CCN grew to 92! There are methods 300 to 400 lines long! The numbers motivated me to fully inspect the code and study the programming style. Next are presented some of the most resounding observations made.
In some cases developers tried to keep classes from growing more, moving code to global functions! An example is a class that updates product count in shop’s categories.
class ProductsCategoriesTotalCount{ public static function update(){ $root_categories=db_get_fields( "SELECT category_id FROM ?:categories WHERE parent_id=0 AND status=?s","A" ); if(!empty($root_categories)){ fn_my_changes_update_product_total_count($root_categories); } } }
The actual update is done in the global function named :
fn_my_changes_update_product_total_count
which calls two or three more global functions, all implemented by the same person. That function is also called from the functional part of the core system from another add-on, again implemented by the same person and obviously all implemented in the same owner’s request. The reasonable design would be to encapsulate those functions to a class and use the class in both cases.
A careful reader might noticed the long name of the function. A coding guidline in that cart software is that global functions should be prefixed with fn_ followed by the add-on name in order to avoid naming conflicts. This is not implied for class methods! A developer in his agony to avoid name conflicts named a class method:
fn_set_brands_features_move_cellphone_categories_per_brand
A function, with her pair fn_update_brand_feature a total 150 LOC could be a class of their own hence they perform a self contained task. Again, ignorance, minimal effort or following the wrong paradigm?
The Cart is a functional program and a part of its core is OO, the part of it that an add-on developer reads to find where to hook and extrend is the functional part. The authors have collected 900 system functions to a dozen of files. The most commonly used functions, ie the one that fetches product lists is more than 700LOC. This can be justified for some reasons, like fewer files means fewer disk reads with result faster program load. I observed Cart’s code evolution the last 5 years and I know that some parts of the code are only fixed and never refactored and also that the authors employ developers of various levels of experteese. It is expected in a corporation, pay an advanced engineer to design and a team of jounior or med level developers to maintain the project. Fact that is visible in the code also. Some parts are brilliantly implemented and some are novice. The question now forms to, is it good for a developer to follow the paradigm of a single project? Specially hack developers should educate their selfs at least studding other projects and try to employ some principles in their every day work?
There is one occurance that a try to OOP ends without meaning. The following class was defined:
class Weight{ public static function updatePrice($price,$weight,$supplierId){ if($supplierId==10 || $supplierId==13 || $supplierId==19){ if($weight>3){ $extra_weight=$weight-3; $price+=$extra_weight*3; } } return($price); } }
And used somewhere in a GOD class (1700LOC and average method CCN 42) like this:
$basePrice=Weight::updatePrice($basePrice,$this->weight,$supplier_id);
An understandable implementation would be, to define an interface of product data modifiers and declare some classes on it to be applied on product data to sanitaze or modify product data.
Lets see now an example of misusage of OO and the minimal effort development. In class BaseSupplierListLoader is defined an method that replaces color names in german to english. BaseSupplierListLoader is the abstract class for vendor specific file loader. A class named ProductRowHelper is the following:
// I want a concrete class of BaseSupplierListLoader only for the colors array class ProductRowHelper extends BaseSupplierListLoader{ public function __construct(){} public function loadFile(){} }
The comment on top of the code is original developer’s comment. ProductRowHelper is used just to replace color names in some other procedure other than loading data from vendor feed. Again developer could spend little more time (30 minutes maybe) and refactor BaseSupplierListLoader to use a new text utility class that replaces color names and employ that class to the new code he was developing. A refactoring with multiple advandages, moving 100 lines from BaseSupplierListLoader class (that is already 1100 lines) and define two methods in a new reusable class. I discovered that such a class could be employed three times with an effort no more than 1 hour.
As mentioned before, a huge amount of dead code was detected, first attempt of product merging. Noone dared or cared to clean up code and keep versions available to owner in future use or need. Particularly the code was unreachable. Tenths of IF block that the control variable was never taking the true value in order to execute the code. Had to read the code again and again to relize that the control variable was actually a dummy variable never changing its value.
What Went Wrong?
I hope that I gave you a picture of how hard became to maintane that code. After four years I was again the new programmer in this project and had to read the code for two or maybe three days to understand the logic. Not the details. But some questions raised:
- Why this happened?
- Does business owner is aware of this issue?
- Has he to be warried about this?
- Is it his fault?
- Is it PHP?
- Is it the e-shop software?
- Is it developers fault?
In an emerging e-shop it is expected to have new features implemented and in smalled sized businesses, that cant afford permanent IT personel, it is expected for developers to come and go. It is common practice in small business to employ freelancers paid by the hour (aka hourlies) to do certain fixes or implement a feature etc. Some times owner have more than one hourlies in their contacts so if one isnt available use another, or if one gives high bid ask others, or if someone cant do something complex use a more expensive but more experienced to do the task. So diferent levels of experience and knowledge will be visible in the code.
I will remind the context of the project. Its an addon for a well known proprietary software. The key phrase is “an addon”. Addon developers are usually called “hack programmers”. Dont be confused with security “hackers and crackers”. Hack programmers are programmers that make their livining developing addons for large projects. I am not implying that they are not good programmers. Many of them are. But many of them after years coding addons forget to look at the big picture and when it comes to larger pieces of software they cant apply anymore design principles and quality assurance. I havent realized that, till some one rejected me for a job because he saw in my CV that last 2 years I was a hack programmer. But I couldnt see the reason. After studing the code of that addon and trying to understand why my design turned from the hack programmers employed from shop owner to a spaggetti code. Indeed hack programmers may have extreeme knowledge of the software they work with and can implement excellect addons for it, but do they care for code design and quality as an application programmer would? Is it in the context of their job to? Usually addons (hacks) vary from 2 to 200 lines of code. And what is the target of their job? To hack little more functionality to the software. There is no reason to improve their design and quality skills, just to know better the software they hack.
In our case study the shoping cart softeware has a functional extension API utilizing hooks. Fact that drives developers to write functions to implement the hooks. Starting with few lines sometimes ends up to few 100s of lines to implement the asked featrure. The majority of addons I have seen follow that practice, while the hook function could instantiate a class that encapsulates the desired functionality and certainly would be more readable (and maintainable) than a 300 lines function.
But here is another question raised. In that 4 years period when developers should stand up and say it is time to refactor? It is a continoous process. There is no critical point, when adding new features or changing business rules, instead of following the minimal effort path, a little more effort can be applied to do a little refactoring and make the code a little better each time. My personal experience is that no project owner, even those with strict budgets, will complain if every two or three weeks implementing a change takes little longer that it should, an hour or two is enough. Not all of them understand the reason of refactoring, but doing it as code evolves want drive the code in state that massive refactoring would need to take place.
Definatelly it is no PHP’s issue. PHP is an extremely easy programming language for a novice to start with. Just write a few lines script and you are a programmer! But projects’ evolution drove them to elegant and complex designs, numerous of features, pushing dynamic language features further. Popular projects are well designed and very easy to extend that even novice coders can implement addons for them. My personal experience with hack developers country wide, is that the majority of them are not well educated and hence they manage to make money with basic knowledge they are not trying to improve their skills. Managing to place building blocks in right place without any deeper understanding is the common skill set of profecionals in e-commerce market.
But still, do the shop owner has to be aware of this, or has he to worry? Technical dept is called. Is he aware of it? Probably not, he is a business man not a programmer. Maybe someone mentioned that the code is messy, certainly not the one who did the most work on it. The later maybe isn’t able to code better, does not care, but in any way admiting to the owner that he produced low quality code, will hurt his reputation as a freelancer. The one who might mentioned code quality was trying to make original coder to look bad and take his place (common practice in some areas of the world) and at the end the owner thought that and ignored him. Freelancer world isnt always sunny. It’s a tough business and shop owners arent always the easiest people to handle and take money from them.
Will he eventually have to pay it? In short term maybe not. The group of hourlies (freelancers paid by the hour) he was employing those years are aware of the code they can easier maintain the code with out extra cost to the owner. But if they do not work on that part of the code for some time, 2 or 3 months, mayby again they will have to look around to find their way, raising the cost for the owner. In the unfortunate case he has to change developers? A developer might not be available at the time needed or even worse, is not interested any more for the certain client. In some cases a developer is hired to do a task and the owner asks him to do a change to that part of the code also. A new developer will charge him more due to the time that he will spend to understand the spaghetti code. The cost for that little fix will be multiple times higher if the new comer has to dig his way out in a messy code.
An estimation is that in those four years the owner payed more that 200 hours of developer work to maintain that part of his shop. The total refactoring time I spend is approximately 50 hours. A respected portion of that time was spend to understand the code, if it was done gradually as the code was growing by the original developers might be 20 to 30 hours. That is 5-8 hours per year! The benefits of well structured code are known and in cases where various developers are evolved it is more beneficial having code base that a new person can easily understand.
Technical dept is created every day, each time code is maintained. It is the result of developers and business owners culture. Business owners of small business cant be educated to check them selves quality of code and cant afford to hire a CTO to controll things. But they can be educated on managerial level on technical dept, so they can understand that it is their own benefit to hire developers that are also aware of it and how to keep it low. So its up to the developers. Developers and specially hack developers lack of education and care for their product, small or large. It is not intended, but its the culture. Do it fast and if it works is enough. What has to change is this culture. Software Engineering principles have to apply to any type and any level of development.
Illustrations from:
1. https://www.apps-world.net/blog/2013/10/17/how-appmachine-aims-lower-cost-development-professional-and-novice-developers-alike-interview-appmachine-founder-and-ceo-siebrand-dijkstra/
2. http://watchdogwire.com/florida/2013/08/26/book-review-what-went-wrong-by-jerome-corsi/