SALE! Use code Spring2024 for 25% off everything!
Hurry, sale ends Wednesday! Click to see the full catalog.

Duck Typing, Scope, and Investigative Functions in Python

Python is a duck typing language. It means the data types of variables can change as long as the syntax is compatible. Python is also a dynamic programming language. Meaning we can change the program while it runs, including defining new functions and the scope of the name resolution. These give us not only a new paradigm in writing Python code but also a new set of tools for debugging. In the following, we will see what we can do in Python that cannot be done in many other languages.

After finishing this tutorial, you will know:

  • How Python manages the variables you define
  • How Python code uses a variable and why we don’t need to define its type like in C or Java

Kick-start your project with my new book Python for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Duck typing, scope, and investigative functions in Python. Photo by Julissa Helmuth. Some rights reserved

Overview

This tutorial is in three parts; they are

  • Duck typing in programming languages
  • Scopes and name space in Python
  • Investigating the type and scope

Duck Typing in Programming Languages

Duck typing is a feature of some modern programming languages that allow data types to be dynamic.

A programming style which does not look at an object’s type to determine if it has the right interface; instead, the method or attribute is simply called or used (“If it looks like a duck and quacks like a duck, it must be a duck.”) By emphasizing interfaces rather than specific types, well-designed code improves its flexibility by allowing polymorphic substitution.

Python Glossary

Simply speaking, the program should allow you to swap data structures as long as the same syntax still makes sense. In C, for example, you have to define functions like the following:

While the operation x * x is identical for integers and floating-point numbers, a function taking an integer argument and a function taking a floating-point argument are not the same. Because types are static in C, we must define two functions although they perform the same logic. In Python, types are dynamic; hence we can define the corresponding function as:

This feature indeed gives us tremendous power and convenience. For example, from scikit-learn, we have a function to do cross validation:

But in the above, the model is a variable of a scikit-learn-model object. It doesn’t matter if it is a perceptron model as in the above, a decision tree, or a support vector machine model. What matters is that inside the cross_val_score() function, the data will be passed onto the model with its fit() function. Therefore, the model must implement the fit() member function, and the fit() function behaves identically. The consequence is that the cross_val_score() function is not expecting any particular model type as long as it looks like one. If we are using Keras to build a neural network model, we can make the Keras model look like a scikit-learn model with a wrapper:

In the above, we used the wrapper from Keras. Other wrappers exist, such as scikeras. All it does is to make sure the interface of the Keras model looks like a scikit-learn classifier so you can make use of the cross_val_score() function. If we replace the model above with:

then the scikit-learn function will complain as it cannot find the model.score() function.

Similarly, because of duck typing, we can reuse a function that expects a list for a NumPy array or pandas series because they all support the same indexing and slicing operation. For example, we fit a time series with ARIMA as follows:

The above should produce the same AIC scores for each fitting.

Scopes and Name Space in Python

In most languages, variables are defined in a limited scope. For example, a variable defined inside a function is accessible only inside that function:

The local variable discrim is in no way accessible if we are not inside the function quadratic(). Moreover, this may be surprising for someone:

We defined the variable a outside function f, but inside f, variable a is assigned to be 2 * x. However, the a inside the function and the one outside are unrelated except for the name. Therefore, as we exit from the function, the value of a is untouched. To make it modifiable inside function f, we need to declare the name a as global  to make it clear that this name should be from the global scope, not the local scope:

However, we may further complicate the issue when introducing the nested scope in functions. Consider the following example:

The variable a inside function f is distinct from the global one. However, when inside g, since there is never anything written to a but merely read from it, Python will see the same a from the nearest scope, i.e., from function f. The variable x, however, is defined as an argument to the function g, and it takes the value 3 when we called g(3) instead of assuming the value of x from function f.

NOTE: If a variable has any value assigned to it anywhere in the function, it is defined in the local scope. And if that variable has its value read from it before the assignment, an error is raised rather than using the value from the variable of the same name from the outer or global scope.

This property has many uses. Many implementations of memoization decorators in Python make clever use of the function scopes. Another example is the following:

This is a generator function that creates batches of samples from the input NumPy arrays X and y. Such a generator is acceptable by Keras models in their training. However, for reasons such as cross validation, we do not want to sample from the entire input arrays X and y but a fixed subset of rows from them. The way we do it is to randomly select a portion of rows at the beginning of the datagen() function and keep them in Xsamysam. Then in the inner function _gen(), rows are sampled from Xsam and ysam until a batch is created. While the lists Xbatch and ybatch are defined and created inside the function _gen(), the arrays Xsam and ysam are not local to _gen(). What’s more interesting is when the generator is created:

The function datagen() is called two times, and therefore two different sets of Xsamyam are created. But since the inner function _gen() depends on them, these two sets of Xsamysam are in memory concurrently. Technically, we say that when datagen() is called, a closure is created with the specific Xsamysam defined within, and the call to _gen() is accessing that closure. In other words, the scopes of the two incarnations of datagen() calls coexist.

In summary, whenever a line of code references a name (whether it is a variable, a function, or a module), the name is resolved in the order of the LEGB rule:

  1. Local scope first, i.e., those names that were defined in the same function
  2. Enclosure or the “nonlocal” scope. That’s the upper-level function if we are inside the nested function.
  3. Global scope, i.e., those that were defined in the top level of the same script (but not across different program files)
  4. Built-in scope, i.e., those created by Python automatically, such as the variable __name__ or functions list()

Want to Get Started With Python for Machine Learning?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Investigating the type and scope

Because the types are not static in Python, sometimes we would like to know what we are dealing with, but it is not trivial to tell from the code. One way to tell is using the type() or isinstance() functions. For example:

The type() function returns a type object. The isinstance() function returns a Boolean that allows us to check if something matches a particular type. These are useful in case we need to know what type a variable is. This is useful if we are debugging a code. For example, if we pass on a pandas dataframe to the datagen() function that we defined above:

Running the above code under the Python’s debugger pdb will give the following:

We see from the traceback that something is wrong because we cannot get ysam[i]. We can use the following to verify that ysam is indeed a Pandas DataFrame instead of a NumPy array:

Therefore we cannot use ysam[i] to select row i from ysam. What can we do in the debugger to verify how we should modify our code? There are several useful functions you can use to investigate the variables and the scope:

  • dir() to see the names defined in the scope or the attributes defined in an object
  • locals() and globals() to see the names and values defined locally and globally, respectively.

For example, we can use dir(ysam) to see what attributes or functions are defined inside ysam:

Some of these are attributes, such as shape, and some of these are functions, such as describe(). You can read the attribute or invoke the function in pdb. By carefully reading this output, we recalled that the way to read row i from a DataFrame is through iloc, and hence we can verify the syntax with:

If we call dir() without any argument, it gives you all the names defined in the current scope, e.g.,

where the scope changes as you move around the call stack. Similar to dir() without argument, we can call locals() to show all locally defined variables, e.g.,

Indeed, locals() returns you a dict that allows you to see all the names and values. Therefore, if we need to read the variable Xbatch, we can get the same with locals()["Xbatch"]. Similarly, we can use globals() to get a dictionary of names defined in the global scope.

This technique is beneficial sometimes. For example, we can check if a Keras model is “compiled” or not by using dir(model). In Keras, compiling a model is to set up the loss function for training and build the flow for forward and backward propagations. Therefore, a compiled model will have an extra attribute loss defined:

This allows us to put an extra guard on our code before we run into an error.

Further reading

This section provides more resources on the topic if you are looking to go deeper.

Articles

Books

Summary

In this tutorial, you’ve seen how Python organizes the naming scopes and how variables interact with the code. Specifically, you learned:

  • Python code uses variables through their interfaces; therefore, a variable’s data type is usually unimportant
  • Python variables are defined in their naming scope or closure, where variables of the same name can coexist in different scopes, so they are not interfering with each other
  • We have some built-in functions from Python to allow us to examine the names defined in the current scope or the data type of a variable

Get a Handle on Python for Machine Learning!

Python For Machine Learning

Be More Confident to Code in Python

...from learning the practical Python tricks

Discover how in my new Ebook:
Python for Machine Learning

It provides self-study tutorials with hundreds of working code to equip you with skills including:
debugging, profiling, duck typing, decorators, deployment, and much more...

Showing You the Python Toolbox at a High Level for
Your Projects


See What's Inside

2 Responses to Duck Typing, Scope, and Investigative Functions in Python

  1. Avatar
    Chandra February 17, 2022 at 6:50 pm #

    Excellent points! Dr. Jason. Thanks for sharing. i just quickly read and wish to read again to get the depth. Cheers!

    • Avatar
      James Carmichael February 18, 2022 at 12:52 pm #

      You are very welcome Chandra!

Leave a Reply