《计量经济学编程——以Python语言为工具》(严子中、张毅)

Chapter 2: Numerical Python
— Python数值计算

March, 2024

第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Outline

  • Motivations
  • Numerical Computation
  • Data Manipulation
  • Data Importing and Exporting
  • Pass Data Between Python and Stata
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Motivations

  • For the applied economic analysis, we need to
    • import and manipulate the data,
    • generate descriptive statistics, and
    • transform the data into specific shapes for econometric models.
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Motivations

  • When estimating the model, we need to perform
    • numerical computations,
    • such as linear algebra operations.
  • Python programming provides an advantage in tackling these complex tasks with appropriate packages.
  • This chapter will focus on these points and introduce key Python packages, including NumPy and Pandas.
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation

  • The NumPy package (read as NUMerical PYthon)
    • provides access to a new data structure called arrays,
    • which allows efficient vector and matrix manipulations and extensive linear algebra operations.
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)
  • NumPy has a rich set of vibrant functions.
  • We first learn the basics essential for econometric programming.
  • For further reading, please see its online documentation of NumPy (numpy.org/doc/).
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: arrays

  • A new data type (provided by NumPy) called “array”.
  • An array may resemble a list.
    • It can only hold elements of the same type.
    • Arrays are more efficient for storing data.
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: arrays

  • As a result,
    • arrays are the preferred data structure for numerical calculations,
    • especially when working with vectors and matrices.
  • In NumPy, vectors, matrices, and tensors with more than two indices are all referred to as arrays.
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: one-dimensional arrays

  • Import NumPy into the current Python instance and assign it a local name.
import numpy as np
vec = np.array([-0.3,-2.5,3.1,4,5])
vec 
array([-0.3, -2.5,  3.1,  4. ,  5. ])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: one-dimensional arrays

  • numpy.array() function easily converts a list (or tuple) into an array.
print(vec**2)
print(np.abs(vec)) # numpy.abs()返回绝对值
    [ 0.09  6.25  9.61 16.   25.  ]
    [0.3 2.5 3.1 4.  5. ]
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: one-dimensional arrays

  • The numpy.arange() function is similar to the built-in range() function.
np.arange(0,10,1)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
  • In this example,
    • start and step options are both optional with default values of zero and one, respectively.
    • To verify this default value, one can execute the command np.arange(10).
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: one-dimensional arrays

  • If the values of start, stop, and step are integers, the returned values should match those produced by the range() function.
  • To obtain the same results using np.arange():
np.array(list(range(0,10,1)))
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: one-dimensional arrays

  • numpy.linspace(start, stop, num=50) function
    • also provides evenly spaced samples,
    • with the difference being that the num parameter specifies the number of samples to be calculated over the interval [start, stop].
# 使用linespace函数,从0到2等分十份并取值
np.linspace(0,2,10)
array([0.        , 0.22222222, 0.44444444, 0.66666667, 0.88888889,
     1.11111111, 1.33333333, 1.55555556, 1.77777778, 2.        ])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: one-dimensional arrays

Substitute linspace with arange gives ten evenly spaced values between [0,2).

# 使用arange函数,并且从0到2每次间隔0.2
np.arange(0,2,0.2)
array([0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: one-dimensional arrays

Applying an additional option to the linspace() function, the endpoint of the interval can be excluded

# 使用linespace函数,比且从0到2等分10个节点,不包含最后一个
np.linspace(0,2,10,endpoint=False)
array([0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: one-dimensional arrays

Also, please note the type of objects generated using NumPy functions is called numpy.ndarray --- not a list.

type(np.arange(10)), type(list(range(0,10,1)))
(numpy.ndarray, list)
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: one-dimensional arrays

To convert a NumPy array to a list or tuple, we can use the standard list() or tuple().

veclist = list(vec)   # 将vec变量转化成list
vectuple = tuple(vec) # 将vec变量转化成tuple
type(vec), type(veclist), type(vectuple)
(numpy.ndarray, list, tuple)
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: two-dimensional arrays

To create a two-dimensional array,

mat = np.array([[1,2],[3,4],[5,6]])
mat
array([[1, 2],
       [3, 4],
       [5, 6]])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: two-dimensional arrays

  • Indeed, the two dimensional array is a matrix (a 3 by 2 matrix).
  • We can view the shape of an array by:
np.shape(mat)
(3, 2)

(vector)。

第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: two-dimensional arrays

  • numpy.zeros() and numpy.ones()
    • allows one to create arrays with specified shapes containing values of 0 and 1, respectively.
np.zeros(3) # 生成一个维度是3且全部是0的array
array([0., 0., 0.])
np.zeros((1,3)) # 生成一个维度是(1,3)且全部是0的array
array([[0., 0., 0.]])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: two-dimensional arrays

  • np.zero(3) gives a 1d-array with simply three elements,
  • np.zero((1,3)) returns a 1-row and 3-column matrix-like 2d-array.
np.zeros(3).shape, np.zeros((1,3)).shape
    ((3,), (1, 3))
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: two-dimensional arrays

numpy.ones() function to generate arrays with values ones:

np.ones((1,3))
array([[1., 1., 1.]])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: two-dimensional arrays

  • When it comes to matching dimensions,
  • numpy.reshape(array, newshape) is to perform this task without altering the data within the arrays.
mat.reshape((1,6))
array([[1, 2, 3, 4, 5, 6]])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: two-dimensional arrays

Although a 1d-array is just an array with several elements, reshape() command can transform it into a 2d-array or a vector.

np.reshape(np.arange(1,7),(1,6))
array([[1, 2, 3, 4, 5, 6]])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: indexing

We can utilize Python's indexing and slicing rule to access specific values. For example,

vec = np.array([-0.3,-2.5,3.1,4,5])
vec[-1], vec[:3]
(5.0, array([-0.3, -2.5,  3.1]))
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: indexing

To access and set elements in a 2D array,

mat = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
mat[0,0], mat[0,1]
(1, 2)
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: slicing

To extract the subarray comprising of the initial two rows and columns 1 and 2,

mat[0:2,1:3]
array([[2, 3],
       [6, 7]])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: slicing

print(mat[:,1])
print(mat[:,1:2])
print(mat[1,:])
print(mat[1:2,:])
[ 2  6 10]
[[ 2]
 [ 6]
 [10]]
[5 6 7 8]
[[5 6 7 8]]
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: indexing and slicing

To perform calculations for specific elements in an array, one can use the indexing and slicing method:

print(mat[0:2,0:2]+1)
[[2 3]
 [6 7]]
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: NumPy Data Types

Consider the following numpy arrays

a = np.array([1,2])
b = np.array([1.0,2])
a.dtype, b.dtype
    (dtype('int64'), dtype('float64'))
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: Numpy Data Types

  • The .dtype attribute describes the data type of the NumPy array object.
  • a is an integer and b is float
  • Notice that type(b) always returns numpy.ndarry.
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: NumPy Data Types

  • In econometric computation, specifying a particular data type for arrays is common in practice.
  • This can be achieved using the dtype option in functions.
c = np.array([1,2], dtype='float32')
d = np.array([1,2], dtype='float16')
c.dtype, d.dtype
(dtype('float32'), dtype('float16'))
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: NumPy Data Types

Examples:

e = np.zeros((5))
f = np.zeros((5), dtype='int')
e.dtype, f.dtype
(dtype('float64'), dtype('int64'))
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: linear algebra operations

  • NumPy can execute standard linear algebra operations, including matrix transpose, inverse, and multiplication, which are extensively employed in econometrics.
  • Its linear algebra toolbox is highly effective.
  • We list a few key usages below.
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: transpose, inverse and eigenvalues

The .T operator can simply Transpose the matrix.

x = np.array([[1,2], [3,4]])
x.T
array([[1, 3],
       [2, 4]])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: transpose

Please be aware that transposing a 1-dimensional array results in no change.

v = np.array([1,2,3])
v.T
array([1, 2, 3])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: inverse

  • In econometrics, we use the inverse of a matrix very often.
  • In Python, using NumPy can help us compute a matrix's inverse efficiently.
  • One can use numpy.linalg.inv() function.
np.linalg.inv(x)
array([[-2. ,  1. ],
       [ 1.5, -0.5]])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: eigenvalues

Here is a small example that computes the eigenvectors and eigenvalues of an identity matrix using numpy.linalg.eig function

# 生成 3 by 3 的identity matrix
I = np.eye(3) 
print("I(3):\n", I)
eig_values, eig_vector = np.linalg.eig(I)
print("Eigenvalues of I(3):\n", eig_values)
print("Eigenvectors of I(3):\n", eig_vector)
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: eigenvalues

I(3):
 [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
Eigenvalues of I(3):
 [1. 1. 1.]
Eigenvectors of I(3):
 [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: NumPy's arithmetical operations

For performing the entrywise operation, i.e., addition (+ or add()), subtraction (- or subtract()), multiplication (* or multiply()) and division (/ or divide()) of two NumPy arrays with the same shape:

x = np.array([[1,2], [3,4]])
y = np.array([[5,6], [7,8]]) 
print(x+y)
print(np.add(x,y))
print(np.subtract(x,y))
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: NumPy's arithmetical operations

[[ 6  8]
 [10 12]]
[[ 6  8]
 [10 12]]
[[-4 -4]
 [-4 -4]]
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: NumPy's arithmetical operations

# 两个矩阵x与y相乘
print(x*y)
print(np.multiply(x,y))
[[ 5 12]
 [21 32]]
[[ 5 12]
 [21 32]]
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: NumPy's arithmetical operations

  • It is important to note that
    • the NumPy's calculator can sometimes perform the computation,
    • even if two arrays have different shapes.
  • Let us consider the following cases.
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)
  1. if x is a (m,k) shaped matrix and y is a (k,) 1d-array, x+y gives a (m,k) matrix, in which each row of the matrix x is added by y element wisely.
x = np.array([[1,2], [3,4], [5,6]])
y = np.array([10,20])
x+y

array([[11, 22],
[13, 24],
[15, 26]])

第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)
  1. (m,k) shaped matrix with a (1,k) shaped 2d-array (row vector) yields identical results as above:
y = np.array([10,20]).reshape((1,2))
x+y
array([[11, 22],
       [13, 24],
       [15, 26]])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)
  1. y is a (m,1) shaped array (column vector), x+y gives a (m,k) matrix, in which each column of x is added by y element wisely.
y = np.array([10,20,30]).reshape((3,1))
x+y
array([[11, 12],
       [23, 24],
       [35, 36]])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: matrix multiplications

  • We often need to calculate matrix multiplication in econometrics.
  • In Python, there are a few typical ways to perform matrix multiplication.
  • Let us consider a matrix and a matrix.
  • Both numpy.dot() and numpy.matmul() perform the matrix product.
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: matrix multiplications

A = np.array([[1,2],[3,4],[5,6]])
B = np.array([[7,8],[9,10]]) 
np.dot(A, B) 
array([[ 25,  28],
       [ 57,  64],
       [ 89, 100]])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Or,

np.matmul(A,B)
array([[ 25,  28],
       [ 57,  64],
       [ 89, 100]])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Also,

A@B
array([[ 25,  28],
       [ 57,  64],
       [ 89, 100]])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: matrix multiplications

Transpose operator .T can be used together to compute :

A@B.T # 默认先转置,再做矩阵相乘
array([[ 23,  29],
       [ 53,  67],
       [ 83, 105]])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: outer products

The numpy.outer() function computes the outer product of two 1d-arrays, two row vectors, or two column vectors:

a = np.array([1,2,3,4])
b = np.array([10,20,30,40])
print(np.outer(a,b))
[[ 10  20  30  40]
 [ 20  40  60  80]
 [ 30  60  90 120]
 [ 40  80 120 160]]
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: outer products

np.outer(a.reshape(1,4),b.reshape(1,4))==np.outer(a.reshape(4,1),b.reshape(4,1))
array([[ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: inner products

The numpy.inner() function computes the inner product of two 1d-arrays or two row vectors:

# 当a和b均为单维数组
print(np.inner(a,b))
# 当a和b均为行向量的双维数组
print(np.inner(a.reshape(1,4),b.reshape(1,4)))
300
[[300]]
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: matrix multiplication

  • If the inputs of numpy.dot and numpy.matmul are 1d-arrays,
  • both functions return the inner product as well:
np.dot(a,b), np.matmul(a,b), a@b
(300, 300, 300)
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: matrix multiplication

numpy.vdot() flattens 2d-arrays provided as input into 1d-arrays.

a_mat = np.array([[1,2],[3,4]])
b_mat = np.array([[10,20],[30,40]])
np.vdot(a_mat,b_mat)
300
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: commonly used functions

  • Econometricians often need to perform simulations to verify that estimation and inference methods work well.
  • The key function numpy.random.rand() (frequently used in the future) can create an array of the given shape and populates it with random samples from a uniform distribution over interval.
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: commonly used functions

  • The below example provides a 1d-array within which each element is randomly generated from a standard uniform distribution.
np.random.rand(3)
array([0.74744715, 0.81168166, 0.78421755])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: commonly used functions

Or a matrix of random uniform samples:

np.random.rand(2,2)
array([[0.49294625, 0.36978829],
       [0.64566554, 0.53672976]])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: commonly used functions

  • Computers cannot generate numbers in an entirely random way.
  • It uses a deterministic procedure that maps a seed number to a specific function.
  • This is as known as _the pseudo-random number generator.
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: copy NumPy Data

  • As introduced in the previous chapter, two variables with identical contents would share the same variable identifiers.
  • In NumPy,
a = np.array([1,2])
b = a
print(id(a)==id(b))
True
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: copy NumPy Data

Modifying b will get reflected in a

b.shape = 2,1
a,b
(array([[1],
        [2]]),
 array([[1],
        [2]]))
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: copy NumPy Data

However, redefining b does not affect a:

a = np.array([1,2])
b = a
b = b+1
a,b
(array([1, 2]), array([2, 3]))
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: concatenate arrays

  • NumPy hstack() and vstack() functions can append arrays horizontally and vertically.
# 生成1到12的整数并构造成3 by 4的矩阵
A = np.arange(1,13).reshape(3,4)
# 生成13到18的整数并构造成3 by 2的矩阵
B = np.arange(13,19).reshape(3,2)
# 生成20到27的整数并构造成2 by 4的矩阵
C = np.arange(20,28).reshape(2,4)
(A,B,C)
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)
(array([[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]]),
 array([[13, 14],
        [15, 16],
        [17, 18]]),
 array([[20, 21, 22, 23],
        [24, 25, 26, 27]]))
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)
np.hstack((A,B,B)) # 把矩阵A,B,B沿着相同行数垒在一起
array([[ 1,  2,  3,  4, 13, 14, 13, 14],
       [ 5,  6,  7,  8, 15, 16, 15, 16],
       [ 9, 10, 11, 12, 17, 18, 17, 18]])
np.vstack((A,C,C)) # 把矩阵A,C,C沿着相同列数垒在一起
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [20, 21, 22, 23],
       [24, 25, 26, 27]])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: concatenate arrays

  • Examples of hstack and vstack can be reproduced using the numpy.concatenate((a1, a2, ...), axis=0) function.
  • Need to specify the axis along which the arrays will be stacked.
    • axis=1 for the horizontal axis, and
    • axis=0 for the vertical axis.
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: concatenate arrays

np.concatenate((A,B,B), axis=1)
array([[ 1,  2,  3,  4, 13, 14, 13, 14],
       [ 5,  6,  7,  8, 15, 16, 15, 16],
       [ 9, 10, 11, 12, 17, 18, 17, 18]])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: concatenate arrays

Repeat the same array, the numpy.tile() function can be a convenient option.

D = np.arange(1,7).reshape(2,3)
np.tile(D,(3,1)), np.tile(D,(1,3)), np.tile(D,(3,2))
(array([[1, 2, 3],
        [4, 5, 6],
        [1, 2, 3],
        [4, 5, 6],
        [1, 2, 3],
        [4, 5, 6]]),
 array([[1, 2, 3, 1, 2, 3, 1, 2, 3],
        [4, 5, 6, 4, 5, 6, 4, 5, 6]]),
 array([[1, 2, 3, 1, 2, 3],
        [4, 5, 6, 4, 5, 6],
        [1, 2, 3, 1, 2, 3],
        [4, 5, 6, 4, 5, 6],
        [1, 2, 3, 1, 2, 3],
        [4, 5, 6, 4, 5, 6]]))
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: concatenate arrays

Repeat the same array, the numpy.tile() function can be a convenient option.

D = np.arange(1,7).reshape(2,3)
np.tile(D,(3,1)), np.tile(D,(1,3)), np.tile(D,(3,2))
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)
(array([[1, 1, 1, 2, 2, 2, 3, 3, 3],
        [4, 4, 4, 5, 5, 5, 6, 6, 6]]),
 array([[1, 2, 3],
        [1, 2, 3],
        [1, 2, 3],
        [4, 5, 6],
        [4, 5, 6],
        [4, 5, 6]]),
 array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6]))
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: descriptive statistics of arrays

Generate the data:

# 为了随机生成的数字可以被重复,通常给定随机种子的编号
np.random.seed(1212)
# 随机从分布uniform~[0,1)当中获取4 by 3的矩阵
x = np.random.rand(4,3)
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: descriptive statistics of arrays

  • We compute the following statistics

  • mean(x) returns the sample mean

  • std(x) returns the standard deviation

  • sum(x) returns the summation

  • amin(x) and amax(x) returns the minimum value and maximum value

  • ptp(x) returns the range (maxima minus minima of `x`)

  • percentile(x,q) returns the q-th percentile

第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: descriptive statistics of arrays

print("Mean=%f \nStd. Dev.=%f \nSum=%f \nMin=%f \nMax=%f \nRange=%f \nMedian=%f"
     % (np.mean(x), np.std(x), np.sum(x),
        np.amin(x), np.amax(x),
        np.ptp(x), np.percentile(x,50)))
Mean=0.512636 
Std. Dev.=0.311660 
Sum=6.151627 
Min=0.085679 
Max=0.969521 
Range=0.883842 
Median=0.503443
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: efficiency

  • While doing numerical computing,
    • users may wish to speed up the Python code as the data can sometimes be extensive or the algorithm can be complicated.
  • Here we start with an illustrative example.
  • Consider a series for a large value of

第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: efficiency (example 1)

  1. for-loop
  2. List comprehension
  3. NumPy
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: efficiency (example 1)

def forloop(n):
    sig = 0 # 初始值为0
    for i in range(n): # 循环n范围的值
        sig = float(i)*float(i) + sig #每次对sig循环赋值
    return(sig)
def listcomp(n):
    return(sum([float(x)*x for x in range(n)]))
def numpymethod(n):
    return(np.sum(np.arange(0, n, dtype='d')**2))
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

It is important to always check the functions before implementing the main program. In this case,

res1 = forloop(1000)
res2 = listcomp(1000)
res3 = numpymethod(1000)
res1, res2, res3
(332833500.0, 332833500.0, 332833500.0)
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: efficiency (example 1)

In order to track the speed of every method, we can incorporate a timer using the built-in time module.

import time # 引入 time 模块
def timer(f, args): # 定义时长方程Timer,函数第一位置为目标方程,第二位置为次数
    starttime = time.time() # Starting time 定义开始时间
    y = f(*args) # Tuple arg as input argument
    return(time.time() - starttime)
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: efficiency (example 1)

The main program starts from here:

n = 1000000
forloop_time  = timer(forloop,(n,))
listcomp_time = timer(listcomp,(n,))
numpy_time   = timer(numpymethod,(n,))

print("n is set to be %d" % n)
print("for-loop takes %6.5f seconds." % forloop_time)
print("List comprehension takes %6.5f seconds." % listcomp_time)
print("NumPy takes %6.5f seconds." % numpy_time)
n is set to be 1000000
for-loop takes 0.22419 seconds.
List comprehension takes 0.18575 seconds.
NumPy takes 0.01047 seconds.
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: efficiency (example 2)

  • Computing summations is a usual task in econometric programming.
  • Suppose we are interested in computing quantities of where each , for , has a double summations representation

第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: efficiency (example 2)

  • As an illustrative example, we set and .
n = 500
K = 4
np.random.seed(12345)
X = np.random.rand(n,K)
c = np.arange(K)+10
  • Python is generally limited to a single core when processing code.
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: efficiency (example 2)

def doublefor():
    Sum = np.zeros([1,K])
    for i in range(n):
        for j in range(n):
            for k in range(K):
                Sum[:,k] = Sum[:,k]+(X[i,k]-c[k])*(X[j,k]-c[k])
    return Sum
starttime = time.time()
[doublefor() for i in range(10)]
endtime = time.time()-starttime
print("for-loop takes %6.5f seconds:" % endtime)
print("double sum is",doublefor() )
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)
for-loop takes 25.01792 seconds:
double sum is [[22519883.25492541 27621066.13247766 33021514.42390825 39079057.04898941]]
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Numerical Computation: efficiency (example 2)

  • If is larger,
    • the computational speed of this function can be even slower.
  • However, NumPy uses multiple CPUs for specific operations and could generally be much faster.
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

The code presented below runs much faster than the doublefor function:

def matcomp3d():
    M = np.zeros([n,1,K])
    L = np.zeros([n,n,K])
    Sum = np.zeros([K])
    for k in range(K):
        M[:,:,k] = (X[:,k]-c[k]).reshape(n,1)
        L[:,:,k] = np.dot(M[:,:,k],M[:,:,k].T)
        Sum[k] = np.sum(L[:,:,k])
    return Sum
starttime = time.time()
[matcomp3d() for i in range(10)]
endtime = time.time()-starttime
print("matrix computation takes %6.5f seconds:" % endtime)
print("double sum is",doublefor() )
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)
matrix computation takes 0.05578 seconds:
double sum is [[22519883.25492541 27621066.13247766 33021514.42390825 39079057.04898941]]
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Data Manipulations

  • This section introduces a third-party Python package,
  • This package is optimized on top of NumPy and offers a variety of data structures
    • that are perfect for time series and spreadsheet-style analysis.
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Data Manipulations: Pandas

  • Pandas has two main data structures:
    • Series and
    • DataFrame.
  • A Series is made up of an index and its corresponding data values.
  • A DataFrame encapsulates a Series that extends to two dimensions.
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Data Manipulations: Pandas Series and DataFrame

import pandas as pd
contents = np.array([1,5,np.nan,6])
# Panda Series
pd.Series(contents)
# Panda DataFrame
pd.DataFrame(contents)
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Data Manipulations: Pandas data structure

We can define names or labels for both rows and columns in a DataFrame:

df = pd.DataFrame([1,5,np.nan,6],
                  index=['row1','row2','row3','row4'],
                  columns=['col1'])
df
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Data Manipulations: Pandas data structure

Please note that

  • the input dictionary keys now serve as the column headings or labels of the DataFrame.
  • Each corresponding column matches the list of values from the dictionary.

To view the labels:

df.index, df.columns  
(Index(['row1', 'row2', 'row3', 'row4'], dtype='object'),
 Index(['col1'], dtype='object'))
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Data Manipulations: transpose the DataFrame

DataFrame can be transposed using .T method as if operating a matrix.

df.T # 转置
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Data Manipulations: Pandas summary statistics

Its summary statistic can be readily displayed using describe method:

df.describe().T 

A different method of defining a DataFrame involves using dictionaries:

data = {'Name':['Tom','Mary','John','Bill'],
        'Age':[20,21,19,18]} 
pd.DataFrame(data) 
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Data Manipulations: Pandas data structure

One can select part of the dictionary to construct a DataFrame:

data = {'Name':['Tom','Mary','John','Bill'],
        'Class':['I','I','II','II'],
        'Age':[20,21,19,18],
        'GPA':[4.1,3.2,4.0,3.8]}
pd.DataFrame(data, columns=['Name','GPA'],
             index=['001','002','003','004'])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Data Manipulations: Pandas data structure

Alternatively, it is possible to define an empty DataFrame and then fill in the columns:

import random
# 生成一个空DataFrame
df = pd.DataFrame()
# 添加每一列的内容
df['ID'] = np.arange(1,5,1)
df['Random'] = random.sample(range(1,1000),4)
df['Gender'] = ['Male','Female','Male','Male']
df['Male'] = [1,0,1,1]
df

To create a Dataframe in which observations are indexed by time periods, we can start with generating a date/time object (using pandas.date_range function):

第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Data Manipulations: Pandas date objects

dates = pd.date_range('20220401', periods=3, freq="W")
dates
DatetimeIndex(['2022-04-03', '2022-04-10', '2022-04-17'], dtype='datetime64[ns]', freq='W-SUN')
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Then we pass this date object as the argument for an index of the DataFrame.

df = pd.DataFrame(np.random.randn(3,4), index=dates,
                  columns=list('ABCD'))
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Data Manipulations: Pandas functions

  • When working with DataFrames, they can often become large.
  • By using the head(#) and tail(#) functions, you can easily browse the top and bottom rows of the DataFrame.
  • For example,
df.head(2)

Also, we can apply the .loc method as follows. Let us see the example:

df.loc['20220403':'20220410',['A', 'B']]
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Data Manipulations: Pandas functions

  • Another method similar to .loc is called .iloc.
    • one has to specify the position using integer numbers
    • or the typical Python index and slicing rule.
df.iloc[[0,1],[0,1]]
  • Note that .at method works only with column and row labels.
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Data Manipulations: Pandas functions

  • The DataFrame can be selected according to the logical conditions.
df[df >= 0]

In the below code:

  • The DataFrame df is passed to melt() function.
  • id_vars is the variable that needs to be left unaltered, which is countries.
  • var_name are the column names.
  • value_name are its values.
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Data Manipulations: Pandas data transformation

  • Melt the data:
df2 = pd.melt(df,id_vars=['Name'], var_name='Variables',
              value_name='Values')
  • Transform the data from long to wide using the pivot():
df2.pivot(index='Name', columns='Variables', values='Values')
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Data Importing and Exporting

  • In the real economic analysis,
    • the very first step is to import a dataset to the Python instance.
  • The dataset itself could be in
    • the plain text format (e.g., csv format)
    • or the Excel spreadsheet (e.g., .xlsx format).
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Data Importing and Exporting: read CSV files

Now we use the pandas.read_csv() function to import the CSV formatted dataset.

data_pd = pd.read_csv("dependency/data_wageedu.csv")
# 显示数据的前5行
data_pd.head(5)
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Data Importing and Exporting: read CSV files

  • Note that the first row of this CSV file represents the variable names, i.e.,
    • logwage and edu, and is regarded as invalid numerical values.
  • delimiter option is to set the string used to separate values in the text file.
  • Values in the CSV file are separated by a comma,
    • we set delimiter=",".
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Data Importing and Exporting: read CSV files

data_np = np.genfromtxt("dependency/data_wageedu.csv",
                        delimiter=",")
# 显示数据的前5行
data_np[:5,:]
array([[      nan,       nan],
       [ 2.868216, 12.      ],
       [ 2.358269,  7.      ],
       [ 2.732919, 12.      ],
       [ 2.416616, 10.      ]])
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Data Importing and Exporting: read Excel files

  • For importing an Excel spreadsheet, Pandas' pandas.read_excel() function can help.
  • The sheet_name option allows users to specify the particular sheet to import.
data2_pd = pd.read_excel("dependency/data_lifesat.xls",
                         sheet_name="Sheet1")
# 显示数据的前5行
data2_pd.head(5)
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Data Importing and Exporting: write data files

  • To save a Pandas' DataFrame as a CSV file,
    • we can use the pandas.DataFrame.to_csv() function.
  • The pandas.DataFrame.to_excel()
    • exports DataFrame to an Excel spreadsheet.
# 将NumPy数组转换为DataFrame
df = pd.DataFrame(data_np)
# 将DataFrame存为CSV文件
df.to_csv("data_df.csv")
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Data Importing and Exporting: write data files

  • Now convert an array to a CSV file using NumPy's savetext() function.
  • We save the array variable as a CSV file named data_np.csv.
np.savetext("data_np.csv", data_np,
            delimiter = ",")
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Pass Data Between Python and Stata

  • After configuring the Python-Stata integration, it is possible to
    • transfer the current dataset in Stata to Python in the form of NumPy arrays or Pandas DataFrames.
  • Now we save the current dataset in Stata as a Pandas DataFrame variable.
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Pass Data Between Python and Stata

# 配置Python-Stata交互
import os
os.chdir('/Applications/Stata/utilities')
from pystata import config
config.init('mp')
# Stata中读取auto.dta数据
from pystata import stata
stata.run('''sysuse auto, clear''')
# 将Stata当前数据传输到Python
auto_pd = stata.pdataframe_from_data()
auto_pd.head()
(1978 automobile data)
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)

Pass Data Between Python and Stata

  • In another way around,
    • NumPy arrays or Pandas DataFrames can be sent to Stata.
  • Below, we load the Pandas DataFrame auto_pd into Stata.
stata.pdataframe_to_data(auto_pd, force=True)
stata.run('list in 1/2')
第二章配套课件
《计量经济学编程——以Python语言为工具》(严子中、张毅)
     +------------------------------------------------------------------------+
  1. |        make | price | mpg | rep78 | headroom | trunk | weight | length |
     | AMC Concord |  4099 |  22 |     3 |      2.5 |    11 |   2930 |    186 |
     |------------------------------------------------------------------------|
     |     turn     |     displa~t     |     gear_ra~o     |     foreign      |
     |       40     |          121     |     3.5799999     |           0      |
     +------------------------------------------------------------------------+

     +------------------------------------------------------------------------+
  2. |        make | price | mpg | rep78 | headroom | trunk | weight | length |
     |   AMC Pacer |  4749 |  17 |     3 |        3 |    11 |   3350 |    173 |
     |------------------------------------------------------------------------|
     |     turn     |     displa~t     |     gear_ra~o     |     foreign      |
     |       40     |          258     |          2.53     |           0      |
     +------------------------------------------------------------------------+
第二章配套课件