Code bloat


Code bloat

Code bloat is the production of code that is perceived as unnecessarily long, slow, or otherwise wasteful of resources. Code bloat can be caused by inadequacies in the language in which the code is written, inadequacies in the compiler used to compile the code, or by a programmer. Therefore, code bloat generally refers to source code size (as produced by the programmer), but sometimes is used to refer instead to the generated code size or even the binary file size.

Contents

Common Causes

Often, bloated code can result from a programmer who simply uses more lines of code than the optimal solution to a problem.

Some reasons for programmer derived code bloat are:

  • overuse of object oriented (OOP) constructs—such as classes and inheritance can lead to messy and confusing designs, often taking many more lines of code than an optimal solution.
  • incorrect usage of design patterns -- OOP developers will often attempt to "force" design patterns as solutions to problems that do not need them
  • overuse of OOP methods/functions/procedures—breaking an algorithm up into many methods is a way to allow developers to reuse these methods to solve other problems. However, this often adds code bloat and makes the code difficult, if not impossible, to read and debug and reduces algorithmic efficiency.
  • declarative programming -- implementing a declarative programming style in an imperative or OOP language often leads to code bloat.
  • excessive loop unrolling -- without justification through improved performance
  • excessive use of multiple conditional If statements—instead of, for instance, using a lookup table

Some naïve implementations of the template system employed in C++ are examples of inadequacies in the compiler used to compile the language.

A naïve compiler implementing this feature can introduce versions of a method of a template class for every type it is used with. This in turns leads to compiled methods that may never be used, thus resulting in code bloat. More sophisticated compilers and linkers detect the superfluous copies and discard them, or avoid generating them at all, reducing the bloat. Thus template code can result in smaller binaries because a compiler is allowed to discard this kind of dead code.[1]

Some examples of native compiler derived bloat include:

  • dead code -- code which is executed but whose result is never used.
  • redundant calculations—re-evaluating expressions that have already been calculated once. Such redundant calculations are often generated when implementing "bounds checking" code to prevent buffer overflow. Sophisticated compilers calculate such things exactly once, eliminating the following redundant calculations, using common subexpression elimination and loop-invariant code motion.

Examples

The following JavaScript algorithm has a large number of redundant variables, unnecessary logic and inefficient string concatenation.

// Complex 
function TK2getImageHTML(size, zoom, sensor, markers) {
    var strFinalImage = "";
    var strHTMLStart = '<img src="';
    var strHTMLEnd = '" alt="The map"/>';    
    var strURL = "http://maps.google.com/maps/api/staticmap?center=";
    var strSize = '&size='+ size;
    var strZoom = '&zoom='+ zoom;
    var strSensor = '&sensor='+ sensor;    
 
    strURL += markers[0].latitude;
    strURL += ",";
    strURL += markers[0].longitude;
    strURL += strSize;
    strURL += strZoom;
    strURL += strSensor;
 
    for (var i = 0; i < markers.length; i++) {
        strURL += markers[i].addMarker();
    }
 
    strFinalImage = strHTMLStart + strURL + strHTMLEnd;
    return strFinalImage;
};

The same logic can be stated more efficiently as follows:

// Simplified 
TK2.getImageHTML = function(size, zoom, sensor, markers) {
    var url = [ 'http://maps.google.com/maps/api/staticmap',
        '?center=', markers[0].latitude, ',', markers[0].longitude,
        '&size=', size,
        '&zoom=', zoom,
        '&sensor=', sensor ]; 
    for (var i = 0; i < markers.length; i++) {
        url.push(markers[i].addMarker());
    }
    return '<img src="' + url.join('') + '" alt="The map" />';
}

Code density of different languages

The difference in code density between various computer languages is so great that often less memory is needed to hold both a program written in a "compact" language (such as a domain-specific programming language, Microsoft P-Code, or threaded code), plus an interpreter for that compact language (written in native code), than to hold that program written directly in native code.

Performance implications

In many cases, when two programs implement the same functionality, the larger program will also run slower than the smaller program. There are however a few cases where there is a space-time tradeoff -- in these cases, a larger program can run faster than a smaller one.

Reducing bloat

Some techniques for reducing code bloat include[2]:

  • refactoring commonly-used code sequence into a subroutine, and calling that subroutine from several locations, rather than copy and pasting the code at each of those locations,
  • re-using subroutines that have already been written (perhaps with additional parameters) , rather than re-writing them again from scratch as a new routine.

See also

References

  1. ^ hopl-may.dvi
  2. ^ "Code bloat". DocForge. http://docforge.com/wiki/Code_bloat. Retrieved 30 December 2009. 

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Software bloat — Software bloat, also known as bloatware or elephantware [ [http://www.downloadsquad.com/2008/01/01/the 5 most annoying programs on your pc/ The 5 most annoying programs on your PC Download Squad ] ] , is a term used in both a neutral and… …   Wikipedia

  • Interface bloat — In software design, interface bloat (also called fat interfaces by Bjarne Stroustrup and [http://foozle.berkeley.edu/projects/streek/agile/bad smells in code.html#Refused+Bequest Refused Bequests] by Martin Fowler) is a term used to describe… …   Wikipedia

  • Generic programming — is a style of computer programming in which algorithms are written in terms of to be specified later types that are then instantiated when needed for specific types provided as parameters and was pioneered by Ada which appeared in 1983. This… …   Wikipedia

  • Software pipelining — In computer science, software pipelining is a technique used to optimize loops, in a manner that parallels hardware pipelining. Software pipelining is a type of out of order execution, except that the reordering is done by a compiler (or in the… …   Wikipedia

  • C++ — The C++ Programming Language, written by its architect, is the seminal book on the language. Paradigm(s) Multi paradigm:[1] procedural …   Wikipedia

  • C++ — У этого термина существуют и другие значения, см. C. См. также: Си (язык программирования) C++ Семантика: мультипарадигмальный: объектно ориентированное, обобщённое, процедурное, метапрограммирование Тип исполнения: компилируемый Появился в …   Википедия

  • Very long instruction word — or VLIW refers to a CPU architecture designed to take advantage of instruction level parallelism (ILP). A processor that executes every instruction one after the other (i.e. a non pipelined scalar architecture) may use processor resources… …   Wikipedia

  • Header-only — A library, usually in C or C++, is called header only if the full definitions of all macros, functions and classes comprising the library are visible to the compiler in a header / include file formcite book last = Wilson first = Matthew… …   Wikipedia

  • Template (programming) — Templates are a feature of the C++ programming language that allow functions and classes to operate with generic types. This allows a function or class to work on many different data types without being rewritten for each one. Templates are of… …   Wikipedia

  • Software framework — A software framework, in computer programming, is an abstraction in which common code providing generic functionality can be selectively overridden or specialized by user code providing specific functionality.Frameworks are similar to software… …   Wikipedia