{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Something cool with higher order functions\n", "\n", "In the last lecture, we looked at higher order functions: the idea that you can write functions that use _other functions_ as arguments or as return values. One cool consequence of higher order functions: you don't need multi-argument functions anymore: you only ever need functions that accept one argument.\n", "\n", "> You might think this is trivial: if I want to write a function that takes two integers, just write a function that accepts a structure with two integer fields. This is more subtle than that: we will not use any notion of \"tuples\": pieces of data that actually represent multiple pieces of data.\n", "\n", "Consider a simple function of two arguments:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "17\n" ] } ], "source": [ "def myFun(x, y) :\n", " return 3 * x + y\n", "\n", "print myFun(3, 8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can write a version of this function that only ever accepts one argument at a time. What we're going to do is take advantage of _closures_ (remember Lecture 3) to write a function that takes the _first_ argument, then returns a _new function_ that incorporates the first argument and accepts the second argument. We can then call this new function on the second argument to produce the same result as the original function." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "17\n" ] } ], "source": [ "def myFunCurry(x) : #note that this only takes one argument!\n", " def inner(y) : #this function takes the second argument!\n", " return 3 * x + y\n", " return inner\n", "\n", "print myFunCurry(3)(8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's deconstruct what happened. When we call `myFun(3)`, we're getting back a new function that _closed_ over `3`:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "inter = myFunCurry(3)\n", "print inter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This function is the same as if we had written a function that substituted in `3` for x:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "def inter2(y) :\n", " return 3 * 3 + y\n", "print inter2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These new functions can then accept `y` as their argument to finish the computation:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "for i in range(1, 100) :\n", " for j in range(1, 100) :\n", " assert(myFun(i, j) == myFunCurry(i)(j))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can generalize this to functions of 3 arguments:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "26\n" ] } ], "source": [ "def myFun3(x, y, z) :\n", " return x ** 2 + 3 * y + z\n", "\n", "print myFun3(3, 4, 5)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "26\n" ] } ], "source": [ "def myFun3Curry(x) :\n", " def inner1(y) :\n", " def inner2(z) :\n", " return x ** 2 + 3 * y + z\n", " return inner2\n", " return inner1\n", "\n", "print myFun3Curry(3)(4)(5)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "for i in range (1, 100) :\n", " for j in range (1, 100) :\n", " for k in range (1, 100) :\n", " assert(myFun3(i, j, k) == myFun3Curry(i)(j)(k))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We call this process (moving from a function that takes `k` arguments to a series of functions that each take 1 argument) _Currying_. \"Currying\" is named after Haskell Curry -- and so is the Haskell programming language!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data structures\n", "\n", "We have already seen two basic data structures in python. First, we saw lists:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "[0, 2, 4, 6, 8]\n", "[4, 6]\n", "[0, 2, 4, 6, 8, 10]\n" ] } ], "source": [ "list1 = [0, 2, 4, 6, 8]\n", "print type(list1)\n", "print list1\n", "print list1[2:4]\n", "list2 = list1 + [10]\n", "print list2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Wait, two data structures? Yes! Strings in Python are a data structure too. In fact, like lists, strings are a _sequence_ data structure, that supports several of the same operations as lists:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "5\n", "ell\n", "Hello!\n", "H\n", "e\n", "l\n", "l\n", "o\n", "!\n" ] } ], "source": [ "string1 = 'Hello'\n", "print type(string1)\n", "print len(string1)\n", "print string1[1:4]\n", "string2 = string1 + '!'\n", "print string2\n", "for s in string2 :\n", " print s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tuples\n", "Another sequence type in Python is the _tuple_. These look a lot like lists, with a few exceptions. First, you define them with `( )` instead of `[ ]`. Second, tuples are _immutable_. Once you define them, you cannot add or remove items from them. Think of tuples as a way of defining structures. You can get at the elements of tuples by indexing into them, just like lists or strings:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('Hello', 3.14, 2) \n", "3.14 \n" ] } ], "source": [ "tuple1 = ('Hello', 3.14, 2)\n", "print \"{} {}\".format(tuple1, type(tuple1))\n", "print \"{} {}\".format(tuple1[1], type(tuple1[1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And you can get at elements of a tuple by iterating over them (again, just like lists or strings)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello \n", "3.14 \n", "2 \n" ] } ], "source": [ "for t in tuple1 :\n", " print \"{} {}\".format(t, type(t))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's a fancier way to iterate over a tuple:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello \n", "Hello \n", "3.14 \n", "3.14 \n", "2 \n", "2 \n" ] } ], "source": [ "for i, t in enumerate(tuple1) :\n", " print \"{} {}\".format(t, type(t))\n", " print \"{} {}\".format(tuple1[i], type(tuple1[i]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What's going on with `enumerate` up there? That's a special function for iterating through sequence types (meaning you can use it on strings and lists, too) that emits _tuples_ as its output. The tuples it emits are of the form `(index, value)`. The looping code takes advantage of a handy Python trick called _unpacking_ that lets you get at the elements of a tuple without having to index them." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello\n", "3.14\n", "2\n" ] } ], "source": [ "s, f, i = tuple1\n", "print s\n", "print f\n", "print i" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(0, 'Hello')\n", "(1, 3.14)\n", "(2, 2)\n" ] } ], "source": [ "for packed in enumerate(tuple1) :\n", " print packed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using tuples as your replacement for C-like structs can be tricky, if the tuples get complicated (think about how hard it might be to remember the organization of the tuple). Python provides _named tuples_ as a way around this, which we will get to when we talk about objects." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sets\n", "\n", "Python includes _sets_ as a built-in data type. They operate just like Java sets or STL sets: unordered groups of elements that maintain a _uniqueness_ property, where each value only appears once in the set" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "set(['a', 'c', 'b'])\n" ] } ], "source": [ "set1 = {'a', 'b', 'c'}\n", "print set1 #note the ordering!" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "set(['a', 'c', 'b'])\n" ] } ], "source": [ "set2 = {'a', 'b', 'c', 'a'}\n", "print set2" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "set(['a', 'c', 'b', 'd'])\n", "set(['c', 'b', 'd'])\n" ] } ], "source": [ "set2.add('d')\n", "print set2\n", "set2.remove('a')\n", "print set2" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "set([])\n", "set(['a', 'b'])\n" ] } ], "source": [ "set3 = set() #empty set initialization\n", "print set3\n", "set3.add('a')\n", "set3.add('b')\n", "set3.add('a')\n", "print set3" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "c\n", "b\n", "d\n" ] } ], "source": [ "for d in set2 :\n", " print d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Comprehensions\n", "\n", "Python provides set and list _comprehensions_, which are efficient ways of processing sets and lists to produce new sets and lists (think mathematical set notation)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-1, 3, 2, -2, 3, 2, 0, 3, -2, 2, -2, -3, 3, -2, -1, -2, 0, -4, -2, 0, -4, 3, -2, -4, 3]\n" ] } ], "source": [ "import numpy as np\n", "data = list(np.random.randint(-4, 4, 25))\n", "print data" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 3, 2, 2, 3, 2, 0, 3, 2, 2, 2, 3, 3, 2, 1, 2, 0, 4, 2, 0, 4, 3, 2, 4, 3]\n" ] } ], "source": [ "absdata = [abs(d) for d in data]\n", "print absdata" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "set([0, 1, 2, 3, 4])\n" ] } ], "source": [ "absset = {abs(d) for d in data}\n", "print absset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Dictionaries\n", "\n", "The final \"basic\" data structure in Python is the _dictionary_. (Other languages call them \"associative arrays.\" You probably know them as \"maps\"): data structures that let you map _keys_ to _values_. Each key in a Python dictionary is unique, and that key maps to a certain value." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 3\n" ] } ], "source": [ "dict1 = {'a': 0, 'b': 1, 'c': 3}\n", "print dict1['a'], dict1['c']" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10 3\n" ] } ], "source": [ "dict1['a'] = 10\n", "print dict1['a'], dict1['c']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When iterating over a dictionary, you iterate over the keys. If you want to iterate over both the keys and the values, use `iteritems`" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a 10\n", "c 3\n", "b 1\n", "a 10\n", "c 3\n", "b 1\n" ] } ], "source": [ "for k in dict1 :\n", " print k, dict1[k]\n", " \n", "for k, v in dict1.iteritems() :\n", " print k, v" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Wait, what's going on with `iteritems`? We're not calling it like we do other functions like `len` or `min` or `max`. `iteritems` is a _method_ of the `dict` _class_. `dict1` in the above example (like _all Python data_) is an _object_. (We saw similar ways of calling methods when we `append` items to lists, or `add` items to sets.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Classes and Objects\n", "\n", "> This is not a particularly formal introduction to the Python data model and object model. For that, please refer to documentation on [the Python data model](https://docs.python.org/2/reference/datamodel.html) and [Python classes](https://docs.python.org/2/tutorial/classes.html).\n", "\n", "Python, like C++ and Java, is _object oriented_. The basic data model in Python is that everything is an object of some sort. An object combines data and methods. _Everything in Python is an object_, including \"simple\" data like integers and floats.\n", "\n", "A _class_ in python defines a set of _attributes_: these can be variables or methods. This defines a set of properties that you want all objects of a certain type to have. An _object_ in Python is an _instance_ of a class: it shares attributes with all other classes, but can also have attributes (think: member data) that is different from other instances. This lets you have objects with their own \"local\" data.\n", "\n", "Methods for a class take an extra `self` argument. When you invoke a method on an object (think `myList.append(x)`), this `self` argument refers to the object you invoked the method on (in the example, `myList`)." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "class Counter (object) :\n", " totalCount = 0 #shared number across all instances\n", " \n", " def __init__(self) : #constructor for the class.\n", " self.count = 0 #local count for each instance\n", " \n", " def incr(self) :\n", " Counter.totalCount += 1\n", " self.count += 1\n", " \n", " def __str__(self) : #special function like \"toString\" in Java\n", " return \"Total count: {}, Local count: {}\".format(Counter.totalCount, self.count)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total count: 0, Local count: 0\n", "Total count: 0, Local count: 0\n" ] } ], "source": [ "c1 = Counter()\n", "c2 = Counter()\n", "print c1\n", "print c2" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total count: 10, Local count: 5\n", "Total count: 10, Local count: 5\n" ] } ], "source": [ "for i in range(0,5) :\n", " c1.incr()\n", " c2.incr()\n", "\n", "print c1\n", "print c2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Classes themselves, like functions, are just objects, as are the methods inside them:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n" ] } ], "source": [ "print type(Counter)\n", "print type(Counter.incr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Unsurprisingly, like with functions, Python lets you create new classes dynamically and return them. This gives us a handy way to create things that behave like structures, using the `namedtuple` method:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "import collections\n", "Point = collections.namedtuple('Point',['x', 'y', 'color'])" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Point(x=2.4, y=3.7, color='red')\n" ] } ], "source": [ "p = Point(2.4, 3.7, 'red')\n", "print p" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.4 3.7 red\n" ] } ], "source": [ "print p.x, p.y, p.color" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Pandas\n", "\n", "The place where you will probably be using classes the most is when manipulating pandas _dataframes_: this is the key class provided by pandas (in addition to _series_), and it provides a number of instance methods for manipulating data. We will not spend a lot of time deconstructing pandas dataframes in class -- we will explain as much as is needed in relevant homeworks. You can also look at the [docs](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "data = pd.read_csv('hw02_problem3.csv', header=None, skipinitialspace=True)\n", "print type(data)\n", "data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data1 = data[(data[2] == 'white')]\n", "print type(data1)\n", "data1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data2 = data[(data[2] == 'white')][[0, 1]]\n", "data2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 2 }