paint-brush
The Meaning of Functions in Juliaby@tk3369
750 reads
750 reads

The Meaning of Functions in Julia

by Tom KwongAugust 14th, 2020
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

There are two length functions in Julia that are defined in their own modules. The problem is called extending a function, and that function is attached to a single method. The main controversy is why Julia has to be explicit about extending functions so it can be more convenient and less confusing for new Julia users. The original post triggered over 200 follow-up posts in the Julia Discourse forum. The solution is once a simple solution once-and-for-once, once-over-follow-up is once solution.

Company Mentioned

Mention Thumbnail

Coin Mentioned

Mention Thumbnail
featured image - The Meaning of Functions in Julia
Tom Kwong HackerNoon profile picture

When I first learned about the Julia programming language, there were a few things that gave me the "wat" moments. One of those surprises involves both the naming and meaning of functions.

Interestingly, my naive question triggered over 200 follow-up posts in the Julia Discourse forum. 200! That's one of my best records for motivating fellow developers! 😄

What is the issue?

Let's first take a look at a very simple example.

Suppose that I have a

CalendarApp
module that contains the following code:

struct Meeting
    subject::String
    start_time::DateTime
    end_time::DateTime
end

Then, I want to create a function that calculates the length of a meeting. Super simple, right?  Let's go for it:

length(m::Meeting) = Hour(m.end_time - m.start_time)

When I code, I like a REPL-based development workflow so I can test new code quickly:

julia> covid_meeting = Meeting("COVID Response Committee",
                               DateTime(2020, 6, 14, 8, 0, 0),
                               DateTime(2020, 6, 14, 10, 0, 0))
Meeting("COVID Response Committee", 2020-06-14T08:00:00, 2020-06-14T10:00:00)

julia> println(length(covid_meeting))
2 hours

So far so good! Now, try to use

length
function to determine the length of an array.

julia> length([1,2,3])
ERROR: MethodError: no method matching length(::Array{Int64,1})
You may have intended to import Base.length
Closest candidates are:
  length(::Meeting) at REPL[3]:1

Wat! That's right. Here we get the exact "wat" moment. What happened to the regular

length
function? 

😵 There are two length functions!

The answer is quite simple.  There are actually two

length
functions around.  One of them is defined in
Base
module for which everyone is familiar with, and the other one is just defined above.

Here's my own

length
function:

julia> length
length (generic function with 1 method)

Now, restart the REPL to clear things up and try again:

julia> length([1,2,3])
3

julia> length
length (generic function with 81 methods)

Now, I am able to access the original

length
again. You may also notice that this
length
function is attached to 81 methods.

So, how did that happen?  It seems that I might have hidden the original

length
function by defining our own
length
function earlier.  Out of curiosity, I can define my own function again:

julia> struct Meeting
           subject::String
           start_time::DateTime
           end_time::DateTime
       end

julia> length(m::Meeting) = Hour(m.end_time - m.start_time)
ERROR: error in method definition: function Base.length must be explicitly imported to be extended

Man, now it's doing the exact opposite!  It doesn't even let me define

length
function anymore!  

This is the second "wat" moment for the same problem.

🤔 Did I do anything wrong?

It might worth a quick discussion here about why I did what I did.  And, why I thought I was right.

First of all, I came from an object-oriented programming background.  To be more precise, I had many years of experience developing in the Java language.

How would the same problem look in OOP?  Well, in the object-oriented world, there is probably some kind of

Array
class that defines a
length
method.  In this case, I would also define a
Meeting
 class with a
length
method.  For instance:

my_array.length();        // invokes the length method defined in Array class
my_meeting.length();      // invokes the length method defined in Meeting class

When I call the method, there is no ambiguity.  These are just two different methods from two different classes. 

But wait... Didn't I just do the same thing in Julia?  If I look at the signature of my

length
function, it accepts an argument of data type
Meeting
. So, why couldn't Julia just call my function when I pass a
Meeting
object, and call the regular
length
function when I pass an array?

Here is my primary misconception.

Multiple dispatch only works for a single function. What I have done above actually introduced a second

length
function, and that function is attached to a single method.

More precisely, the two

length
functions are defined in their own modules. Let me prefix with their respective namespaces and the number of methods:

Base.length               # 81 methods
CalendarApp.length        # 1 method

🐛 Here's the easy fix...

As I want multiple dispatch to kick in, I just need to make sure that I define a new method for the

Base.length
function rather than defining my own function.  This is also called extending a function.  There are two ways to archive that.

Option #1 (preferred): prefix the function name with the module name.

Base.length(m::Meeting) = Hour(m.end_time - m.start_time)

Option #2: import the length function before defining it.

import Base: length

length(m::Meeting) = Hour(m.end_time - m.start_time)

Now, let's start a new REPL and try again:

julia> struct Meeting
           subject::String
           start_time::DateTime
           end_time::DateTime
       end

julia> Base.length(m::Meeting) = Hour(m.end_time - m.start_time)

julia> length
length (generic function with 82 methods)

Alright, the

length
function now has 82 methods attached.

Let's confirm its functionality.

julia> covid_meeting = Meeting("COVID Response Committee",
                               DateTime(2020, 6, 14, 8, 0, 0),
                               DateTime(2020, 6, 14, 10, 0, 0))
Meeting("COVID Response Committee", 2020-06-14T08:00:00, 2020-06-14T10:00:00)

julia> length(covid_meeting)
2 hours

julia> length([1,2,3])
3

Voila! Problem solved!

📌 Wait, why do I have to do that?

There is already a simple solution once I understand how multiple dispatch works in Julia.

So, how did I trigger 200+ follow-up posts in Discourse?

The main controversy is why I have to be explicit about extending

Base.length
.  Since
Base.length
has a name of
length
, and
CalendarApp.length
has a name of
length
, why wouldn't Julia just automatically merge them?

The whole thread of discussion in Discourse goes about how it can be more convenient and less confusing for new Julia users when the functions can be merged automatically.  I will now argue (against my original opinion in the Discourse thread) that it is a bad idea to do so.

Here is the main reason: just because two functions have the same name doesn't imply that they mean the same thing.

Every function is designed to have a specific meaning.  In English, the meaning of

length
function is pretty much aligned with what one commonly know what a length is.  

To be clear, I will just show the first definition from Dictionary.com:

Length (Noun): the longest extent of anything as measured from end to end.

So, the length concept refers to a measurement. As with any kind of measurement, it means that I should expect it to return a numerical value.

Hence, when anyone calls the

length
function, a number is expected to be returned.

This is literally an implicit contract.

Enforcing the same meaning for all

length
methods turns out to be a very useful thing. Right off the bat, I can display a graphical user interface that shows a bar that represents a measurement.  The same component works regardless of whether the object is an array, a
String
, or a
Meeting

This is also the main reason why Julia packages interoperate so well with each other!

As long as there is consistent names and meanings, we can build very powerful abstraction and interfaces.  Then, everything just works with each other in harmony.

You don't buy it yet?  Just take a look at the various types of Julia array implementations.  These arrays can be used anywhere a regular array is accepted.

😈 Playing devil's advocate...

Now, what happens if I ignore the implicit contract and define the length of a meeting to be a string? For instance:

function Base.length(m::Meeting) 
   if m.end_time - m.start_time > Hour(1)
       return "Long"
   else
       return "Short"
   end
end

Well, it's probably fine because

Meeting
is my own data type. 

However, it also means that I should not let anyone else use

Meeting
.  Why? That's because another developer will probably get very confused to experience my
length
function returning a string rather than a number, and that could cause serious problems.

Remember the GUI component I talked about earlier? It's going to be so broken.

Not keeping a consistent meaning (implicit contract) for a function is a recipe for failure. It severely limits the reusability of functions.

🤓 What if I really want to use the same function name for different purpose?

If I insist that my

length
function should return a string, then I really have two options.

First, I can define my own function and not extend from

Base.length
.  Second, I could choose a different name for the function.

In the first scenario, I would be able to access both

length
functions.  The caveat is that I will have to use
Base.length
and
CalendarApp.length
instead of the short form.

This is needed to remove the ambiguity about which function I'm referring to.

The best practice, however, is to avoid naming functions with the same name that has already been used in Base. Why? 

  1. All of the exported Base functions are automatically imported into every module with the exception of bare modules. So, you will have a conflict just like how it was described at the beginning of this post.
  2. If you develop packages, then you don't want your users to be confused about your function versus the one in Base.  

Because the Base module is standard library that everyone uses, it's probably not a good idea to define a function with the same name but different meaning.

🛰️  What if the dependent module isn't Base?

Now, suppose that I am using a different module rather than Base. As an example, I'm going to pick on one of my favorite packages Distributions.jl.

A typical Julia user would do the following:

using Distributions

I do that, too, when I need to use it interactively.  However, if I need to use it in my app, then I would want to import only the functions that I need into my namespace.  For example, let's say I want to calculate the mean and mode of some randomly-generated data, I would do this:

using Distributions: mean, mode

This is actually quite important!

First, by bringing only known functions into my namespace, it reduces the chance of function name collision. Just take a look at the huge number of exported names by Distributions.jl.

Second, I'm making my code future-proof.  Let's say I have already defined a function named

dist
 in my module. My code will still work even if Distribution.jl happens to define and export their own
dist
in a future version.  So, I don't need to worry naming conflict because I have only imported 
mean
and
mode
into my namespace.

Final thoughts...

Naming things properly is super important. Besides choosing the right word, it is also important to mean what you mean. 

Over the years, I have developed a habit to ensure writing code that means what I mean. And, it's actually super simple.

Just write documentations.

In Julia, I would write a doc string for every function at the same time that I code that function. Sometimes I change the function name to match my doc string. At other times, I change the doc string to match my function name.

It is quite amazing how effective this can be. I encourage you to give that a try today!

Thank you for reading. 

P.S. For more tips in writing good code in Julia, consider picking up my book Hands-on Design Patterns and Best Practices with Julia.

Lead image by Romain Vignes on Unsplash

Previously published on: https://ahsmart.com/pub/the-meaning-of-functions/