Cách sử dụng Numpy Cho Người Mới Bắt Đầu

I. NumPy là gì? [1]

NumPy là một package nền tảng dùng cho các tính toán khoa học với Python. Nó bao gồm:

một đối tượng mảng N-chiều mạnh mẽ
các hàm phức tạp/tinh vi (sophisticated) nhưng tổng quát/đa dụng (broadcasting/universal)
các công cụ để tích hợp code C/C++ hay Fortran
có khả năng điện sốngẫu nhiên (random number), biến đổi Fourier và các phép đại số tuyến tính hữu ích

Bên cạnh các công dụng khoa học rõ ràng, NumPy còn dùng như một container đã chiều để chứa các dữ liệu tổng quát. Kiểu dữ liệu tuỳ biến (arbitrary data-types) có thể được định nghĩa. Điều này cho phép NumPy tích hợp một cách liên tục và nhanh chóng một loạt các CSDL khác nhau.

II. SciPy là gì? [2]

SciPy (đọc là “Sigh Pie”) là một hệ sinh thái phần mềm nguồn mở trên Python cho các ngành toán học, khoa học, kỹ thuật. Dưới đây là danh sách một số package cố lõi của SciPy:

NumPy (Base N-dimensional array package)
SciPy library (Fundamental library for scientific computing)
Matplotlib (Comprehensive 2D Plotting
IPython (Enhanced Interactive Console)
Sympy (Symbolic mathematics)
pandas (Data structures & analysis

Vậy thôi, muốn biết thêm về SciPy thì xem [2] hoặc [3].

III. Tài liệu NumPy và SciPy [3]

Mọi người có thể tham khảo thêm Manual và Document của Numpy và Scipy tại tham khảo [2].

Trong phần Complete Numpy Manual, vào NumPy User Guide, chọn tiếp Quickstart tutorial, chúng ta sẽ có một số hướng dẫn làm quen với NumPy.

IV. Một số tutorial để làm quen NumPy [4]

Prerequisites

Before reading this tutorial you should know a bit of Python. If you would like to refresh your memory, take a look at the Python tutorial.

If you wish to work the examples in this tutorial, you must also have some software installed on your computer. Please see http://scipy.org/install.html for instructions.

Mình cài NumPy theo yêu cầu của thư viện caffe. Theo hướng dẫn đó, mình đã cài Miniconda, một người anh em của Anaconda (bản được đề cập trong hướng dẫn install scipy ở link trên), thuộc open source package management system and environment management system for installing multiple versions of software packages and their dependencies and switching easily between them, conda .

The Basics

NumPy’s main object is the homogeneous multidimensional array (mảng đa chiều đồng nhất). It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. In Numpy dimensions are called axes. The number of axes is rank.

3 điều cần lưu ý:

Object chính của NumPy là mảng, đa chiều và đồng nhất. Đồng nhất tức là “all of the same type). Các phần tử trong mảng đều cùng 01 kiểu.
Đánh index bằng 01 tuple các sốnguyên dương. Nếu 2 chiều thì là (row, col).
Vấn đề thuật ngữ: NumPy gọi “chiều” gọi là axis (số nhiều là axes), và gọi số chiều là rank.

Numpy array được gọi là ndarray, n-dimensional array, để phân biệt với array cơ bản của python, chỉ có 01 chiều.

Một số thuộc tính của ndarray cần nắm:

ndarray.ndim

· số chiều của array hay rank.

ndarray.shape

· chiều của array, hay đúng hơn là một tuple mô tả chiều của array.

· ví dụ 01 matrix có n rows và m columns, shape sẽ là (n,m).

· độ dài của shape do đó chính là rank (ndim).

ndarray.size

· tổng phần tử trong array.

ndarray.dtype

· trả về 01 object miêu tả kiểu dữ liệu của các phần tử trong array.

· dtype gồm các kiểu tiêu chuẩn của Python công thêm các kiểu riêng của NumPy như: numpy.int32, numpy.int16, and numpy.float64…

· mình nghĩ là nên xài kiểu của NumPy, kiểu gì thì SciPy cũng sẽ tối ưu tính toán hơn khi dùng các kiểu dữ liệu riêng do chính họ tạo ra. Vì họ work nhiều trên chúng mà.

ndarray.itemsize

· kích thước tính bằng byte của mỗi phần tử trong mảng

· ví dụ, một array có dtype là float64 sẽ có itemsize 8 (=64/8)

ndarray.data

· bộ đệm chứa các phần tử của mảng.

· thường thì ta sẽ không dùng thuộc tính này, vì việc truy xuất mảng thông qua index sẽ tiện lợi hơn

Array Creation

There are several ways to create arrays.

· you can create an array from a regular Python list or tuple using the array function

· The type of the resulting array is deduced from the type of the elements in the sequences

· Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy offers several functions to create arrays with initial placeholder content

· zeros

· ones

· empty (uninitialized, output may vary)

· arange (returns arrays instead of lists)

· 1np.arange( 0, 2, 0.3 ) # numbers from 0 to 3, step = 0.3

· linspace

· 1np.linspace( 0, 2, 9 ) # 9 numbers from 0 to 2

· When arange is used with floating point arguments, it is generally not possible to predict the number of elements obtained, due to the finite floating point precision. For this reason, it is usually better to use the function linspace that receives as an argument the number of elements that we want, instead of the step

· random

· 1np.random.random((2,3))

See also

array, zeros, zeros_like, ones, ones_like, empty, empty_like, arange, linspace, numpy.random.rand, numpy.random.randn,fromfunction, fromfile

Printing Arrays

When you print an array, NumPy displays it in a similar way to nested lists, but with the following layout:

· the last axis is printed from left to right,

· the second-to-last is printed from top to bottom,

· the rest are also printed from top to bottom, with each slice separated from the next by an empty line.

· và khi print 01 list, ta sẽ thấy có dấu ,phân cách, còn khi print 01 array, thì các phần tử được phân cách bởi khoảng trắng

One-dimensional arrays are then printed as rows, bidimensionals as matrices and tridimensionals as lists of matrices.

If an array is too large to be printed, NumPy automatically skips the central part of the array and only prints the corners

· To disable this behaviour and force NumPy to print the entire array, you can change the printing options using set_printoptions

· 1np.set_printoptions(threshold='nan')

Basic Operations

Arithmetic operators on arrays apply elementwise (thực hiện trên từng phần tử). A new array is created and filled with the result.

· Unlike in many matrix languages, the product operator *operates elementwise in NumPy arrays. The matrix product can be performed using the dot function or method

· Some operations, such as += and *=, act in place to modify an existing array rather than create a new one.

· Many unary operations, such as computing the sum of all the elements in the array, are implemented as methods of the ndarray class

· 1a.sum()

· 1a.min()

· 1a.max()

· By default, these operations apply to the array as though it were a list of numbers, regardless of its shape. However, by specifying the axis parameter you can apply an operation along the specified axis of an array

· 1b.sum(axis=0)

Chỗ này thật rất hay, tính toán trên mảng như tính toán trên số, không cần for/while các kiểu để duyệt qua các phần tử trong mảng.

Cần lưu ý dot product của matrix và vector khác nhau.

· dot product trên vector, là tổng của tích elementwise.

· dot product của matrix là phép nhân thông thường trên matrix. Đó chính là dot product của từng vector hàng và vector cột 2 matrix.

Khi apply các phép toán unary lên một trục nhất định, kết quả tạo thành chỉ theo trục đó thôi

Universal Functions

NumPy provides familiar mathematical functions such as sin, cos, and exp. In NumPy, these are called “universal functions”(ufunc). Within NumPy, these functions operate elementwise on an array, producing an array as output.

See also

all, any, apply_along_axis, argmax, argmin, argsort, average, bincount, ceil, clip, conj, corrcoef, cov, cross, cumprod, cumsum,diff, dot, floor, inner, inv, lexsort, max, maximum, mean, median, min, minimum, nonzero, outer, prod, re, round, sort, std,sum, trace, transpose, var, vdot, vectorize, where

Indexing, Slicing and Iterating

One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences.

· 1a = np.arange(10)**3

· 1a[2]

· 1a[2:5] # [2:5)

· 12a[:6:2] = -1000 # equivalent to a[0:6:2] = -1000;

· from start to position 6, exclusive, set every 2nd element to -1000

· 1a[ : :-1] # reversed a

· 1234567for i in a:

· ... print(i**(1/3.))

· ...

· nan

· 1.0

· nan

· ...

Multidimensional arrays can have one index per axis. These indices are given in a tuple separated by commas

· 1234>>> def f(x,y):

· ... return 10*x+y

· ...

· >>> b = np.fromfunction(f,(5,4),dtype=int)

· 1b[2,3]

· 1b[0:5, 1] # each row in the second column of b

· 1b[ : ,1]

· 1b[1:3, : ] # each column in the second and third row of b

· 1b[-1] # the last row. Equivalent to b[-1,:]

The expression within brackets in b[i] is treated as an i followed by as many instances of : as needed to represent the remaining axes. NumPy also allows you to write this using dots as b[i,...].

The dots (...) represent as many colons as needed to produce a complete indexing tuple. For example, if x is a rank 5 array (i.e., it has 5 axes), then

· x[1,2,...] is equivalent to x[1,2,:,:,:],

· x[...,3] to x[:,:,:,:,3] and

· x[4,...,5,:] to x[4,:,:,5,:].

· 123456>>> c = np.array( [[[ 0, 1, 2], # a 3D array (two stacked 2D arrays)

· ... [ 10, 12, 13]],

· ... [[100,101,102],

· ... [110,112,113]]])

· >>> c.shape

· (2, 2, 3)

Iterating over multidimensional arrays is done with respect to the first axis

· 12345>>> for row in b:

· ... print(row)

· ...

· [0 1 2 3]

· [10 11 12 13]

However, if one wants to perform an operation on each element in the array, one can use the flat attribute which is an iterator over all the elements of the array:

· 1234567>>> for element in b.flat:

· ... print(element)

· ...

· 0

· 1

· 2

· 3

See also

Indexing, Indexing (reference), newaxis, ndenumerate, indices

Shape Manipulation

Changing the shape of an array

The shape of an array can be changed with various commands

· 1234567>>> a = np.floor(10*np.random.random((3,4)))

· >>> a

· array([[ 2., 8., 0., 6.],

· [ 4., 5., 1., 1.],

· [ 8., 9., 3., 6.]])

· >>> a.shape

· (3, 4)

· 12>>> a.ravel() # flatten the array

· array([ 2., 8., 0., 6., 4., 5., 1., 1., 8., 9., 3., 6.])

· 1a.shape = (6, 2)

· 1a.T

The order of the elements in the array resulting from ravel() is normally “C-style”, that is, the rightmost index “changes the fastest”, so the element after a[0,0] is a[0,1]. If the array is reshaped to some other shape, again the array is treated as “C-style”. Numpy normally creates arrays stored in this order, so ravel() will usually not need to copy its argument, but if the array was made by taking slices of another array or created with unusual options, it may need to be copied

The reshape function returns its argument with a modified shape, whereas the ndarray.resize method modifies the array itself:

· 12a.reshape(3,-1) # If a dimension is given as -1 in a reshaping operation,

· the other dimensions are automatically calculated

· 1a.resize((2,6))

See also

ndarray.shape, reshape, resize, ravel

Stacking together different arrays

Several arrays can be stacked together along different axes

· 12>>> a = np.floor(10*np.random.random((2,2)))

· >>> b = np.floor(10*np.random.random((2,2)))

· 1np.vstack((a,b))

· 1np.hstack((a,b))

The function column_stack stacks 1D arrays as columns into a 2D array. It is equivalent to vstack only for 1D arrays

· 1from numpy import newaxis

· 1np.column_stack((a,b)) # With 2D arrays

· 1a[:,newaxis] # This allows to have a 2D columns vector

· 12345>>> a = np.array([4.,2.])

· >>> b = np.array([2.,8.])

· >>> np.column_stack((a,b))

· >>> np.column_stack((a[:,newaxis],b[:,newaxis]))

· # they have the same results

For arrays of with more than two dimensions, hstack stacks along their second axes, vstack stacks along their first axes, and concatenate allows for an optional arguments giving the number of the axis along which the concatenation should happen.

In complex cases, r_ and c_ are useful for creating arrays by stacking numbers along one axis. They allow the use of range literals (”:”) :

· 12>>> np.r_[1:4,0,4]

· array([1, 2, 3, 0, 4])

When used with arrays as arguments, r_ and c_ are similar to vstack and hstack in their default behavior, but allow for an optional argument giving the number of the axis along which to concatenate.

See also

hstack, vstack, column_stack, concatenate, c_, r_

Splitting one array into several smaller ones

Using hsplit, you can split an array along its horizontal axis, either by specifying the number of equally shaped arrays to return, or by specifying the columns after which the division should occur

· 1a = np.floor(10*np.random.random((2,12)))

· 1np.hsplit(a,3) # Split a into 3

· 123np.hsplit(a,(3,4)) # Split a after the third and the fourth column

· # Đại loại, nó sẽ cắt cột thứ 3 thành 1 array, trước đó là 1 array

· và sau đó là 1 array

vsplit splits along the vertical axis, and array_split allows one to specify along which axis to split.

Copies and Views

When operating and manipulating arrays, their data is sometimes copied into a new array and sometimes not. This is often a source of confusion for beginners. There are three cases:

Vụ này đau não đây.

No Copy at All

Simple assignments make no copy of array objects or of their data.

· 1234567>>> a = np.arange(12)

· >>> b = a # no new object is created

· >>> b is a # a and b are two names for the same ndarray object

· True

· >>> b.shape = 3,4 # changes the shape of a

· >>> a.shape

· (3, 4)

Python passes mutable objects as references, so function calls make no copy

· 1234567>>> def f(x):

· ... print(id(x))

· ...

· >>> id(a) # id is a unique identifier of an object

· 148293216

· >>> f(a)

· 148293216

Khi gán hoặc khi pass vào function thì no copy, chỉ refer thôi.

View or Shallow Copy

Different array objects can share the same data. The view method creates a new array object that looks at the same data

· 12345678910111213141516>>> c = a.view()

· >>> c is a

· False

· >>> c.base is a # c is a view of the data owned by a

· True

· >>> c.flags.owndata

· False

· >>>

· >>> c.shape = 2,6 # a's shape doesn't change

· >>> a.shape

· (3, 4)

· >>> c[0,4] = 1234 # a's data changes

· >>> a

· array([[ 0, 1, 2, 3],

· [1234, 5, 6, 7],

· [ 8, 9, 10, 11]])

Slicing an array returns a view of it:

· 12345678>>> s = a[ : , 1:3] # spaces added for clarity;

· could also be written "s = a[:,1:3]"

· >>> s[:] = 10 # s[:] is a view of s.

· Note the difference between s=10 and s[:]=10

· >>> a

· array([[ 0, 10, 10, 3],

· [1234, 10, 10, 7],

· [ 8, 10, 10, 11]])

Khi view hoặc slice, thì NumPy sẽ tạo ra object mới, nhưng object này dùng cùng 01 data với object cũ.

Deep Copy

The copy method makes a complete copy of the array and its data.

· 12345678910>>> d = a.copy() # a new array object with new data is created

· >>> d is a

· False

· >>> d.base is a # d doesn't share anything with a

· False

· >>> d[0,0] = 9999

· >>> a

· array([[ 0, 10, 10, 3],

· [1234, 10, 10, 7],

· [ 8, 10, 10, 11]])

Nếu muốn tạo ra một đối tượng mới hoàn toàn (được cấp 01 vùng data mới luôn) thì dùng lệnh copy.

Functions and Methods Overview

Here is a list of some useful NumPy functions and methods names ordered in categories. See Routines for the full list.

Array Creation

arange, array, copy, empty, empty_like, eye, fromfile, fromfunction, identity, linspace, logspace, mgrid, ogrid, ones,ones_like, r, zeros, zeros_like

Conversions

ndarray.astype, atleast_1d, atleast_2d, atleast_3d, mat

Manipulations

Questions

all, any, nonzero, where

Ordering

argmax, argmin, argsort, max, min, ptp, searchsorted, sort

Operations

choose, compress, cumprod, cumsum, inner, ndarray.fill, imag, prod, put, putmask, real, sum

Basic Statistics

cov, mean, std, var

Basic Linear Algebra

cross, dot, outer, linalg.svd, vdot

Vậy là đã đi xong phần Basic. Phù.

Giờ, mình sẽ đi tiếp phần ít có Basic hơn.

Less Basic

Broadcasting rules – Luật truyền phát

Sau khi đọc qua một lượt, có vẻ mình không hiểu lắm, nên mình sẽ xem [5], để biết chi tiết các ví dụ rồi thuật lại 2 luật đó một cách dễ hiểu hơn.

Thuật ngữ broadcasting mô tả cách mà numpy xử lý các phép tính toán số học trên các mảng có shape khác nhau.

Tuỳ vào một số ràng buộc nhất định, thông thường, những mảng nhỏ hơn sẽ được broadcast thành các mảng lớn hơn, để chúng có cùng shape khi tính toán với nhau.

Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python (Tạm dịch là boardcasting cung cấp 01 công cụ vector hoá các thao tác trên mảng, do đó trong Python sẽ cần phải dùng loop như trong C. Nhưng mình cũng không hình dung lắm.)

Họ cũng thừa nhận, Boardcasting mang lại nhiều hiệu năng trong tính toán, nhưng đồng thời, trong một số trường hợp, nó lại làm chậm việc tính toán hơn.

Cụ thể hơn chút nào.

Thông thường, các phép toán của NumPy thực hiện trên từng cặp 02 mảng, và với mỗi cặp, phép toán sẽ diễn ra trên từng cặp phần tử tương ứng. Do đó, 02 mảng cần có cùng shape. (right)

· 1234>>> a = np.array([1.0, 2.0, 3.0])

· >>> b = np.array([2.0, 2.0, 2.0])

· >>> a * b

· array([ 2., 4., 6.])

Và luật broadcasting sẽ đơn giản hoá ràng buộc này, nếu hình dạng của các mảng rơi vào những trường hợp nhất định. Ví dụ, đơn giản nhất là 01 mảng với 01 scalar:

· 1234>>> a = np.array([1.0, 2.0, 3.0])

· >>> b = 2.0

· >>> a * b

· array([ 2., 4., 6.])

Đọc một chút về cách NumPy thực hiện thao tác trên nào:

The result is equivalent to the previous example where b was an array. We can think of the scalar b being stretched during the arithmetic operation into an array with the same shape as a. The new elements in b are simply copies of the original scalar.

The stretching analogy is only conceptual. NumPy is smart enough to use the original scalar value without actually making copies, so that broadcasting operations are as memory and computationally efficient as possible.

The code in the second example is more efficient than that in the first because broadcasting moves less memory around during the multiplication (b is a scalar rather than an array).

Tóm lại thì NumPy smart enough để không biến b từ một scalar thành một array trong phép tính trên. Vậy, xét về cả hiệu năng, lẫn bộ nhớ thì ví dụ 02 (dùng scalar) hiệu quả hơn ví dụ 1 (dùng một mảng toàn số 02).

General Broadcasting Rules

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when

1. they are equal, or

2. one of them is 1 (ndim = 1)

If these conditions are not met, a ValueError: frames are not alignedexception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the maximum size along each dimension of the input arrays

Arrays do not need to have the same number of dimensions. For example, if you have a 256x256x3 array of RGB values, and you want to scale each color in the image by a different value, you can multiply the image by a one-dimensional array with 3 values. Lining up the sizes of the trailing axes of these arrays according to the broadcast rules, shows that they are compatible

· 123Image (3d array): 256 x 256 x 3

· Scale (1d array): 3

· Result (3d array): 256 x 256 x 3

· 123>>> im = np.random.random((256, 256, 3))

· >>> scale = np.random.random((3,))

· >>> im*scale

When either of the dimensions compared is one, the other is used. In other words, dimensions with size 1 are stretched or “copied” to match the other

Trong trường hợp, nếu im ở trên có shape là (3, 256, 256) thì không compatible với scale.

In the following example, both the A and B arrays have axes with length one that are expanded to a larger size during the broadcast operation:

· 123A (4d array): 8 x 1 x 6 x 1

· B (3d array): 7 x 1 x 5

· Result (4d array): 8 x 7 x 6 x 5

Here are some more examples:

· 1234A (2d array): 5 x 4

· B (1d array): 1

· Result (2d array): 5 x 4

· 1234A (2d array): 5 x 4

· B (1d array): 4

· Result (2d array): 5 x 4

· 1234A (3d array): 15 x 3 x 5

· B (3d array): 15 x 1 x 5

· Result (3d array): 15 x 3 x 5

· 1234A (3d array): 15 x 3 x 5

· B (2d array): 3 x 5

· Result (3d array): 15 x 3 x 5

· 123A (3d array): 15 x 3 x 5

· B (2d array): 3 x 1

· Result (3d array): 15 x 3 x 5

Here are examples of shapes that do not broadcast:

· 12A (1d array): 3

· B (1d array): 4 # trailing dimensions do not match

· 12A (2d array): 2 x 1

· B (3d array): 8 x 4 x 3 # second from last dimensions mismatched

Hai cái được rút ra:

Xét từ phải qua trái
Số phần tử ở mỗi chiều tương ứng từ phải qua trái đó, sẽ compitable nếu:

· một là bằng nhau

· hai là bằng một

· ba là trống

Cả hai tr.hợp bằng một hoặc trống, mảng nhỏ hơn đều broadcast thành mảng lớn bằng cách copy các giá trị của chính nó để điền vào chỗ còn thiếu.

An example of broadcasting in practice

· 1234567891011121314151617181920212223242526272829>>> x = np.arange(4)

· >>> xx = x.reshape(4,1)

· >>> y = np.ones(5)

· >>> z = np.ones((3,4))

· >>> x.shape

· (4,)

· >>> y.shape

· (5,)

· >>> x + y

· <type 'exceptions.ValueError'>: shape mismatch: objects cannot be broadcast to a single shape

· >>> xx.shape

· (4, 1)

· >>> y.shape

· (5,)

· >>> (xx + y).shape

· (4, 5)

· >>> xx + y

· array([[ 1., 1., 1., 1., 1.],

· [ 2., 2., 2., 2., 2.],

· [ 3., 3., 3., 3., 3.],

· [ 4., 4., 4., 4., 4.]])

· 12345678910111213>>> x.shape

· (4,)

· >>> z.shape

· (3, 4)

· >>> (x + z).shape

· (3, 4)

· >>> x + z

· array([[ 1., 2., 3., 4.],

· [ 1., 2., 3., 4.],

· [ 1., 2., 3., 4.]])

Broadcasting provides a convenient way of taking the outer product (or any other outer operation) of two arrays. The following example shows an outer addition operation of two 1-d arrays

· 1234567>>> a = np.array([0.0, 10.0, 20.0, 30.0])

· >>> b = np.array([1.0, 2.0, 3.0])

· >>> a[:, np.newaxis] + b

· array([[ 1., 2., 3.],

· [ 11., 12., 13.],

· [ 21., 22., 23.],

· [ 31., 32., 33.]])

outer product ở đây là tích ngoài. Tích ngoài là gì thì mình cũng không rành nữa

//Ủng hộ kênh và admin subscribe/đăng ký kênh youtube: https://www.youtube.com/user/jackyltle?sub_confirmation=1

Cộng Đồng OpenCV

Top Ad unit 728 × 90

Cách sử dụng Numpy Cho Người Mới Bắt Đầu

1 nhận xét:

Cùng Liên Kết - Best Webpage/Tool

Fanpage Facebook

Đang Truy Cập

Lượt truy cập

Tìm Trong Blog

Giới thiệu về tôi

Popular

Lưu trữ Blog

Ads

Quảng Cáo(Click để ủng hộ Admin)

Biểu mẫu liên hệ