seminars.fb

Programming Fundamentals → “Complexity Analysis, Big-O, Data Structures, and Algorithms”

Seminar (Fri, Mar 12, 2021; 1 PM PST)

Theme: Programming Fundamentals

Topic: Complexity Analysis, Big-O, Data Structures, and Algorithms

Keywords: complexity analysis, big-o, data structures, and algorithms

Presenter	James Powell james@dutc.io
Date	Friday, March 12, 2021
Time	1:00 PM PST

Okay, so your code is slow. That’s no good! Nobody likes slow code. But what does it really matter? And when does it really matter? And what does it all really mean?

This seminar will present an introductory view of a critical topic in software development: complexity analysis and Big-O notation as it applies to our choices of algorithms and data structures in Python programmes. We will covering important questions like:

what it means for some code to be “slow”: is this a fundamental property of the problem itself—after all, some problems are genuinely hard—or is this some deficiency in our approach?
how should we think about optimising our code? when should we micro-optimise and when should we think structurally?
how can Big-O notation give us a way to discuss the “asymptotic” behaviour of our code; why is it important to talk about how the performance of code “scales” with problem size?
how does our choice of built-in or user-defined data structures in Python have a direct impact on the performance of our code, and, consequentially, the underlying tractability of our problem-solving approach?
how does our choice of algorithms tie directly into our choice of data structures and have direct impact on the performance of our work as it scales?

About

Don’t Use This Code; Training & Consulting

Don’t Use This Code is a professional training, coaching, and consulting company. We are deeply invested in the open source scientific computing community, and are dedicated to bringing better processes, better tools, and better understanding to the world.

Don’t Use This Code is growing! We are currently seeking new partners, new clients, and new engagements within Facebook for our expert consulting and training services.

Teams looking to better employ these tools would benefit from our wide range of training courses on offer, ranging from an intensive introduction to Python fundamentals to advanced applications of Python for building large-scale, production systems. Working with your team, we can craft targeted curricula to meet your training goals. We are also available for consulting services such as building scientific computing and numerical analysis systems using technologies like Python and React.

We pride ourselves on delivering top-notch training. We are committed to providing quality training, and we do so by investing in three key areas: our content, our processes, and our contributors.

James Powell; Consultant, Instructor, & Presenter

James Powell is a professional Python programmer and enthusiast. He got his start with the language by building reporting and analysis systems for proprietary trading offices; now, he uses his experience as a consultant for those building data engineering and scientific computing platforms for a wide range of clients using cutting-edge open source tools like Python and React.

He also currently serves as a Board Director, Chair, and Vice President at NumFOCUS, the 501(c)3 non-profit that supports all the major tools in the Python data analysis ecosystem (i.e., pandas, numpy, jupyter, matplotlib). At NumFOCUS, he helps build global open source communities for data scientists, data engineers, and business analysts. He helps NumFOCUS run the PyData conference series and has sat on speaker selection and organizing committees for 18 conferences. James is also a prolific speaker: since 2013, he has given over seventy (70) conference talks at over fifty (50) Python events worldwide.

Prologue

We will use the following simple context manager for assessing timings:

from contextlib import contextmanager
from time import perf_counter

@contextmanager
def timed(msg):
    try:
        start = perf_counter()
        yield
        stop = perf_counter()
    finally:
        print(f'{msg:<24} elapsed \N{greek capital letter delta}t: {stop-start:.2f}s')

Let’s quickly check that it works:

from time import sleep
with timed('test #1'):
    sleep(.1)
with timed('test #2'):
    sleep(.5)
with timed('test #3'):
    with timed('test #3-i'):
        sleep(.25)
    with timed('test #3-ii'):
        sleep(.25)
    with timed('test #3-iii'):
        sleep(.5)

Looks good!

Motivation

Facilitate scale.
Extract useful, operational understanding of core software programming concepts.
- This is not a computer science course; this is not a theory course.
- This material is presented for people with limited prior computer science or software programming background.
- This material is presented to draw a direct, tangible connection between computer science and software programming theory and its application to writing code using Python, numpy, pandas, and other tools.
- We seek to introduce new theory and new vocabulary, where that theory and vocabulary can be put to immediate, production use.
- We balance between the idealistic need to write high-quality, production-grade software and the practical need to deliver useful work on tight, market-driven deadlines.
Present topics that may be new to you, to determine when and where we should go deeper in our discussion and software engineering practice.
We want to help you build & refine your judgement!

Measurements & Testing

Let’s take a quick quiz to see how well informed we are about what is fast and slow in Python.

Key Questions: - is Python “fast” or “slow”? - are lists and dicts “fast” or “slow”? - are generators “fast” or “slow”? - are comprehensions “fast” or “slow”? - is numpy “fast” or “slow”? - is pandas “fast” or “slow”?

Question: Which is faster?

# # generate a big bunch of numbers

from random import randint
with timed('i.   for-loop'):
    xs = []
    for _ in range(1_000_000):
        xs.append(randint(-1_000, 1_000))

from random import randint
with timed('ii.  list comprehension'):
    xs = [randint(-1_000, 1_000) for _ in range(1_000_000)]

from random import randint
with timed('iii. generator expression'):
    xs = list(randint(-1_000, 1_000) for _ in range(1_000_000))

from itertools import starmap, repeat
from random import randint
with timed('iv.  list(itertools.starmap(…))'):
    xs = list(starmap(randint, repeat((-1_000, 1_000), 1_000_000)))

from itertools import starmap, repeat
from random import randint
with timed('v.   [*itertools.starmap(…)]'):
    xs = [*starmap(randint, repeat((-1_000, 1_000), 1_000_000))]

from numpy.random import randint
with timed('vi.  numpy.random.randint(…)'):
    xs = randint(-1_000, 1_000, size=1_000_000)