If you ever had a chance to discuss software performance with your engineer colleagues, you’ve probably noticed that a lot of us like to throw this infamous “Big O” around to explain why is something running slow (or fast).
You’ve probably heard of something “running in polynomial time” or “exponential” or “logarithmic” and how some of these are better than the others, but you’ve never heard an actual explanation why are they better.
My personal experience is that, when it comes to algorithms discussion, people rarely bother to explain what it is exactly that certain algorithm performs better at, when compared to the others? If I told you that, when it comes to driving speed, BMW is a lot faster than a dirt bike, you would probably immediately notice that there is a catch — the conditions under which their performances are measured drastically differ.
Same line of logic can and should be used when we discuss algorithms — consider the context before slapping a label on it. But, before we dig into the actual definitions and explanations, let’s quickly go through what would be common misconceptions about Big O when discussing algorithm performance.
This is the one you’re most likely to hear when discussing complexity with someone who hasn’t fully grasped the concepts yet. Most of the time, Big O cannot tell you anything about actual running time of the algorithm.
While partially correct, by definition, it’s not solely about time. Big O can be used to describe space complexity as well as time complexity, or any other complexity, if you want.
From a software engineering position, one definition I like to use is that Big O actually represents how fast things can get really ugly. For example, take a look at following questions:
If these are the kind of questions you find yourself unable to answer, you just might have a type of problem that needs a correct time complexity identification and potential remedy. Being able to recognize potential issues with your complexity may save you hours or days of needless performance monitoring and examination.
Let us dig deeper now and try a different definition.
Given the input of size n, Big O describes an upper bound on number of relevant, primitive operations algorithm needs to perform.
Both mathematically incorrect and somewhat vague, but still a great starting point — let us dive right into it and we will slide into more formal definitions later.
What is a primitive operation?
Primitive operations are basic computations performed by an algorithm. Examples are evaluating an expression, assigning a value to a variable, indexing into an array, calling a method, returning from a method, etc. They are easily identifiable in pseudocode and largely independent from the programming language.
Let us take a look at an example:
public void SimpleFunction(List<int> n){int a = 1;int b = 2;int sum = a + b;Console.WriteLine(sum);}
How many primitive operations do we have here? Let’s break it down:
public void SimpleFunction(List<int> n){int a; // declarationa = 1; // assignmentint b; // declarationb = 2; // assignmentint sum; // declarationsum = a + b; // two primitive operations, addition and assignmentConsole.WriteLine(sum); // method call}
So, we have three declarations, three assignments, addition and a method call. If we agree on this being our primitive operations, we should be able to work out the following: Given the input size of n, number of operations this function will perform is 8 — or more formally, f(n) = 8.
We can obviously see that size of the input does not matter at all in this and this function will always perform same, constant number of operations.
Let’s try a different one:
public void SumFunction(List<int> n){int sum = 0;for (int i = 0; i < n.Count; i++){sum = sum + i;}Console.WriteLine(sum);}
This one looks harder to break, but let us give it a shot. One thing that will make it easier for us is to convert this for loop into an identical while loop.
public void SumFunction(List<int> n){int sum;sum = 0;int i;i = 0;
while (i < n.Count) // n comparisons{sum = sum + i; // n additions and assignmentsi = i + 1; // n additions and assignments}
sum = sum + 5; // one addition and one assignmentConsole.WriteLine(sum);}
Well, obviously we have 2 declarations and 2 assignments initially, but there is also something else. This loop will evaluate i
against n
exactly n times and perform that many additions and assignments as well. Additionally, we will stir this up with one more addition at the end. So, this is how we stand at the moment: 2 declarations, 2 assignments, n comparisons, n additions, n assignments, 1 addition, 1 assignment and 1 method call.
If you remember from the last definition, if we want to measure something, we will need to decide on which primitive operation is relevant for us here. Once we do that, we will disregard all the others in a way that we will consider them performing without taking any resources — time or space, and we will only focus on relevant, primitive operations.
Please note that declaring what is a primitive operation hugely depends on a number of factors like compiler implementation or processor architecture. However, for practical purposes, you can declare anything as a primitive operation based on level of abstraction that you want to use as point of view.
Reasonable thing to do for us would be to choose addition as our primitive operation and see how many additions this function performs based on its input — that would be n additions and one more at the end. We will write that down as:
f(n) = n + 1
What this means is that, given the input of size n, we will need to perform n + 1 additions. A list of 100 numbers will give us a 101 addition and a list of 1000 numbers will give us a 1001 additions. Now, with our definition of Big O from before, we can deduce that approximately, SumFunction from above performs about n relevant, primitive operations or that it performs in O(n) time. As you might know, the f(n) = n + 1 is called a linear function (value of the function grows linearly with the value of the input) — O(n) as approximation is therefore called linear time.
Linear function f(n) = n + 1
But what about our previous function f(n) = 8? Well, first of all, we need to define a relevant, primitive operation for it as well — choosing the addition will give us f(n) = 1. In time complexity theory this is called constant time and is written down as O(1).
Let’s consider the following:
public void DoSomething(List<int> n){SimpleFunction(n);SumFunction(n);}
We have a function of O(1) and O(n) running one after another. This will give us a running time of O(n) + O(1). Adding Big O approximations follows the rule of greatest order of function taking precedence and all the other being disregarded. That would mean that O(n+1) = O(n) and O(n) + O(1) = O(n). In a same manner, we can disregard any constants — f(n) = 3n + 5 falls under O(n) order. This is very important for us, because we need to identify the weak spot — that one function that grows the fastest and one that will take our systems down.
Before we discuss the upper bound and other function orders, let me get back to the primitive operations. As I’ve mentioned, you can choose your relevant, primitive operations and I’ll give you an example of what this means in practice. Let’s take a look at this piece of code:
public void Email(List<Employee> employees){for (int i = 0; i < employees.Count; i++){_emailService.SendHolidayNotificationEmail(employees[i].Email);_logger.Log("Email sent");}}
We know that sending an email is going to take a lot more time than logging and it’s reasonable for us to choose it as a primitive operation for this function. That function itself might make O(n²) different primitive operations (e.g. validation based on the length of the email), but we can abstract that away, because that is not a function of input of our method. Changing the size of list of employees will not change the running time of the SendHolidayNotificationEmail method.
We can safely disregard the logging as a trivial operation (as we do in practice, most of the time). There are no hard and fast rules on what should be a primitive operation, but you should always bear in mind that you need to define one for your function, before you try to measure the performance — otherwise it’s just wild guessing.
Consider the following piece of code:
public void FindItem(List<Employee> employees){string name = "";for (int i = 0; i < n.Count; i++){if (employees[i].Name == "Mark")break;}Console.WriteLine(name);}
What is the running time of this function? …
Well, it depends if there is an employee with that name and its position in the list. Declaring a array access as a primitive operation and given the list of size n, we could perform anywhere between 1 and n array accesses. With no specific statistical distribution, we can safely assume that in average, Mark would be found somewhere in the middle and we would need to perform approximately n/2 array accesses to find him.
But Big O is not about average-case, it’s about worst-case and the worst case would be n array accesses. More specifically, saying that something performs in O(n) time means that certain algorithm will need to perform less than or equal to n number of operations.
There are different types of running times. Take a look at this example:
public void Multiplies(int n){for (int i = 0; i < n; i++){for (int j = 0; j < n; j++){Console.WriteLine(i * j);}}}
With primitive operation being console writes, we can see that we need to perform n * n operations or n². This is quadratic time or O(n²).
But what about this?
public void MultipliesWithCatch(int n){for (int i = 0; i < n; i++){for (int j = 0; j < 5; j++) // <-- only to 5{Console.WriteLine(i * j);}}}
Is this O(n²)?
No, it’s not. Given the size of the input n, the maximum number of operations performed will be 5 * n which puts it under O(n) order. This is a very important thing to notice — Big O describes function growth relative to the size of the input. Changing the size of the input does not the affect the number of operations in the inner loop, as it will always be five. Now we’re getting somewhere!
Now, there are other types of functions, so there must be other types of Big O notations, right? Cubic function would give us cubic running time, exponential functions would give us exponential running times etc.
Different types of functions
But there are more complex functions — what about them? Well, as we have seen, highest order function takes precedence and others are disregarded. So let’s make an assumption that we have measured exact number of primitive operations that some function performs and we got the following result:
f(n) = n⁵ + 6n² + 5
By mentioned rules, this belongs to O(n⁵). There are infinite variations, but running time is always declared by the fastest growing part of the function. These are called polynomial running times, which O(n) and O(n²) are subset of. There are other types of running times like exponential, logarithmic or factorial and certain famous algorithms.
We will discuss other common running times described through Big O notation in the next part.