Numpy

20 min read

NumPy (Numerical Python) is like a super-powered calculator for Python. Think of it as:

  • Regular Python lists = Basic calculator
  • NumPy arrays = Scientific calculator with advanced functions

Why NumPy?

  • Faster: 50-100x faster than regular Python lists
  • Less memory: Uses less RAM
  • More features: Mathematical functions, linear algebra, statistics
  • Foundation: Used by pandas, matplotlib, scikit-learn, and more

Installation #

pip install numpy

1. Getting Started #

Importing NumPy #

import numpy as np  # Standard convention

Your First NumPy Array #

# From Python list
my_list = [1, 2, 3, 4, 5]
my_array = np.array(my_list)
print(my_array)  # [1 2 3 4 5]
print(type(my_array))  # <class 'numpy.ndarray'>

# Direct creation
arr = np.array([10, 20, 30, 40, 50])
print(arr)  # [10 20 30 40 50]

Real-world analogy: Like converting a shopping list (Python list) into a spreadsheet (NumPy array) for better organization and calculations.

2. Creating Arrays #

Different Ways to Create Arrays #

# 1. From lists
arr1d = np.array([1, 2, 3])  # 1D array
arr2d = np.array([[1, 2, 3], [4, 5, 6]])  # 2D array

# 2. Zeros and Ones
zeros = np.zeros(5)  # [0. 0. 0. 0. 0.]
ones = np.ones((2, 3))  # 2x3 array of ones
"""
[[1. 1. 1.]
 [1. 1. 1.]]
"""

# 3. Range of numbers
range_arr = np.arange(0, 10, 2)  # [0 2 4 6 8] (start, stop, step)
linspace_arr = np.linspace(0, 10, 5)  # [0. 2.5 5. 7.5 10.] (evenly spaced)

# 4. Random numbers
random_arr = np.random.random(5)  # 5 random numbers between 0-1
random_int = np.random.randint(1, 10, 5)  # 5 random integers between 1-9

# 5. Identity matrix
identity = np.eye(3)  # 3x3 identity matrix
"""
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
"""

# 6. Full arrays
full_arr = np.full((2, 3), 7)  # 2x3 array filled with 7
"""
[[7 7 7]
 [7 7 7]]
"""

Real-world Examples: #

# Student grades for 3 subjects, 4 students
grades = np.array([
    [85, 90, 78],  # Student 1
    [92, 88, 91],  # Student 2
    [76, 82, 89],  # Student 3
    [88, 85, 87]   # Student 4
])

# Monthly sales data (12 months)
sales = np.array([15000, 18000, 22000, 25000, 23000, 27000,
                  30000, 28000, 31000, 29000, 26000, 24000])

# Temperature readings (7 days, 4 times per day)
temperatures = np.random.uniform(20, 35, (7, 4))  # Random temps 20-35°C

3. Array Properties and Attributes #

arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

# Shape - dimensions of array
print(arr.shape)  # (3, 4) - 3 rows, 4 columns

# Size - total number of elements
print(arr.size)   # 12

# Dimensions
print(arr.ndim)   # 2 (2D array)

# Data type
print(arr.dtype)  # int64 (or int32 depending on system)

# Memory usage
print(arr.nbytes) # bytes used

# Example with real data
student_scores = np.array([[85, 90, 78, 92],
                          [88, 76, 85, 89],
                          [92, 88, 91, 87]])

print(f"Class size: {student_scores.shape[0]} students")  # 3 students
print(f"Number of tests: {student_scores.shape[1]} tests")  # 4 tests
print(f"Total scores recorded: {student_scores.size}")  # 12 scores

4. Array Indexing and Slicing #

1D Array Indexing #

arr = np.array([10, 20, 30, 40, 50])

# Positive indexing
print(arr[0])   # 10 (first element)
print(arr[2])   # 30 (third element)

# Negative indexing
print(arr[-1])  # 50 (last element)
print(arr[-2])  # 40 (second to last)

# Slicing [start:stop:step]
print(arr[1:4])   # [20 30 40]
print(arr[:3])    # [10 20 30] (first 3)
print(arr[2:])    # [30 40 50] (from index 2 to end)
print(arr[::2])   # [10 30 50] (every 2nd element)

2D Array Indexing #

grades = np.array([[85, 90, 78],  # Student 1
                   [92, 88, 91],  # Student 2
                   [76, 82, 89]])  # Student 3

# Access specific element [row, column]
print(grades[0, 1])  # 90 (Student 1, Subject 2)
print(grades[2, 0])  # 76 (Student 3, Subject 1)

# Access entire row
print(grades[1])     # [92 88 91] (All grades for Student 2)

# Access entire column
print(grades[:, 0])  # [85 92 76] (Subject 1 for all students)

# Slicing ranges
print(grades[0:2, 1:3])  # First 2 students, subjects 2-3
"""
[[90 78]
 [88 91]]
"""

# Real-world example: Monthly sales by region
sales_data = np.array([
    [15000, 18000, 22000],  # Region 1 (Q1, Q2, Q3)
    [20000, 23000, 25000],  # Region 2
    [12000, 15000, 18000],  # Region 3
    [25000, 28000, 30000]   # Region 4
])

# Q2 sales for all regions
q2_sales = sales_data[:, 1]
print("Q2 Sales:", q2_sales)  # [18000 23000 15000 28000]

# Region 1 and 2 sales for all quarters
top_regions = sales_data[0:2, :]
print("Top 2 regions:\n", top_regions)

Boolean Indexing #

scores = np.array([85, 92, 76, 88, 91, 78, 95, 82])

# Find scores above 85
high_scores = scores[scores > 85]
print("High scores:", high_scores)  # [92 88 91 95]

# Multiple conditions
good_scores = scores[(scores >= 80) & (scores <= 90)]
print("Good scores (80-90):", good_scores)  # [85 88 82]

# Real example: Filter temperatures
temperatures = np.array([22, 35, 28, 41, 19, 33, 25, 38])
hot_days = temperatures[temperatures > 30]
print("Hot days (>30°C):", hot_days)  # [35 41 33 38]

5. Array Operations #

Arithmetic Operations #

# Element-wise operations
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])

print(a + b)    # [6 8 10 12]
print(a - b)    # [-4 -4 -4 -4]
print(a * b)    # [5 12 21 32]
print(a / b)    # [0.2 0.33 0.43 0.5]
print(a ** 2)   # [1 4 9 16] (square)

# Operations with scalars
arr = np.array([10, 20, 30])
print(arr + 5)   # [15 25 35]
print(arr * 2)   # [20 40 60]
print(arr / 10)  # [1. 2. 3.]

# Real example: Price calculations
prices = np.array([100, 250, 75, 180])
tax_rate = 0.08

# Add tax
final_prices = prices * (1 + tax_rate)
print("Prices with tax:", final_prices)  # [108. 270. 81. 194.4]

# Apply discount
discount = 0.1
discounted_prices = prices * (1 - discount)
print("Discounted prices:", discounted_prices)  # [90. 225. 67.5 162.]

Mathematical Functions #

arr = np.array([1, 4, 9, 16, 25])

# Square root
print(np.sqrt(arr))      # [1. 2. 3. 4. 5.]

# Logarithms
print(np.log(arr))       # Natural log
print(np.log10(arr))     # Base 10 log

# Trigonometric functions
angles = np.array([0, 30, 45, 60, 90]) * np.pi / 180  # Convert to radians
print(np.sin(angles))    # Sine values
print(np.cos(angles))    # Cosine values

# Statistical functions
data = np.array([10, 15, 20, 25, 30, 35, 40])
print(f"Mean: {np.mean(data)}")        # 25.0
print(f"Median: {np.median(data)}")    # 25.0
print(f"Std Dev: {np.std(data)}")      # 10.0
print(f"Min: {np.min(data)}")          # 10
print(f"Max: {np.max(data)}")          # 40

# Real example: Test scores analysis
test_scores = np.array([78, 85, 92, 88, 76, 91, 83, 87, 90, 79])
print(f"Class average: {np.mean(test_scores):.1f}")     # 84.9
print(f"Highest score: {np.max(test_scores)}")          # 92
print(f"Lowest score: {np.min(test_scores)}")           # 76
print(f"Standard deviation: {np.std(test_scores):.1f}") # 5.7

6. Array Reshaping and Manipulation #

Reshaping Arrays #

# Create 1D array
arr = np.arange(12)  # [0 1 2 3 4 5 6 7 8 9 10 11]

# Reshape to different dimensions
arr_2d = arr.reshape(3, 4)  # 3 rows, 4 columns
print(arr_2d)
"""
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
"""

arr_3d = arr.reshape(2, 2, 3)  # 2 layers, 2 rows, 3 columns
print(arr_3d.shape)  # (2, 2, 3)

# Flatten back to 1D
flattened = arr_2d.flatten()
print(flattened)  # [0 1 2 3 4 5 6 7 8 9 10 11]

# Real example: Image data (pixels)
# Imagine 24 pixel values for a 6x4 image
pixels = np.arange(24)
image = pixels.reshape(6, 4)  # 6 rows, 4 columns
print("Image shape:", image.shape)  # (6, 4)

Joining and Splitting Arrays #

# Concatenation
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Horizontal concatenation
horizontal = np.concatenate([arr1, arr2])
print(horizontal)  # [1 2 3 4 5 6]

# For 2D arrays
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])

# Vertical stacking (row-wise)
v_stack = np.vstack([matrix1, matrix2])
print("Vertical stack:\n", v_stack)
"""
[[1 2]
 [3 4]
 [5 6]
 [7 8]]
"""

# Horizontal stacking (column-wise)
h_stack = np.hstack([matrix1, matrix2])
print("Horizontal stack:\n", h_stack)
"""
[[1 2 5 6]
 [3 4 7 8]]
"""

# Splitting arrays
big_array = np.arange(10)  # [0 1 2 3 4 5 6 7 8 9]
split_arrays = np.split(big_array, 5)  # Split into 5 parts
print("Split arrays:", split_arrays)

# Real example: Combining quarterly sales
q1_sales = np.array([15000, 18000, 20000])
q2_sales = np.array([22000, 25000, 23000])
q3_sales = np.array([27000, 30000, 28000])
q4_sales = np.array([31000, 29000, 26000])

# Combine all quarters
yearly_sales = np.vstack([q1_sales, q2_sales, q3_sales, q4_sales])
print("Yearly sales by quarter:\n", yearly_sales)

7. Broadcasting #

Broadcasting allows NumPy to perform operations on arrays with different shapes.

# Example 1: Array + Scalar
arr = np.array([[1, 2, 3],
                [4, 5, 6]])
result = arr + 10  # Adds 10 to every element
print(result)
"""
[[11 12 13]
 [14 15 16]]
"""

# Example 2: Different shaped arrays
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])
row_vector = np.array([10, 20, 30])

result = matrix + row_vector  # Adds row_vector to each row
print(result)
"""
[[11 22 33]
 [14 25 36]
 [17 28 39]]
"""

# Real example: Apply different discounts to products
prices = np.array([[100, 200, 150],  # Electronics
                   [50, 75, 25],     # Books
                   [30, 40, 35]])    # Food

# Different discount rates for each category
discounts = np.array([0.1, 0.05, 0.15])  # 10%, 5%, 15%

# Apply discounts (broadcasting)
discounted_prices = prices * (1 - discounts.reshape(-1, 1))
print("Discounted prices:\n", discounted_prices)

8. Advanced Array Operations #

Sorting #

# 1D sorting
scores = np.array([85, 92, 76, 88, 91, 78, 95, 82])
sorted_scores = np.sort(scores)
print("Sorted scores:", sorted_scores)  # [76 78 82 85 88 91 92 95]

# Get indices that would sort the array
sort_indices = np.argsort(scores)
print("Sort indices:", sort_indices)  # [2 5 7 0 3 4 1 6]

# 2D sorting
grades = np.array([[85, 90, 78],
                   [92, 88, 91],
                   [76, 82, 89]])

# Sort along axis (0=rows, 1=columns)
sorted_by_column = np.sort(grades, axis=0)  # Sort each column
print("Sorted by column:\n", sorted_by_column)

# Real example: Student rankings
student_names = np.array(['Alice', 'Bob', 'Charlie', 'Diana'])
final_scores = np.array([88, 92, 76, 95])

# Get ranking (highest to lowest)
ranking_indices = np.argsort(final_scores)[::-1]  # Reverse for descending
ranked_students = student_names[ranking_indices]
ranked_scores = final_scores[ranking_indices]

print("Student Rankings:")
for i, (student, score) in enumerate(zip(ranked_students, ranked_scores)):
    print(f"{i+1}. {student}: {score}")

Unique Values and Counting #

# Find unique values
grades = np.array(['A', 'B', 'A', 'C', 'B', 'A', 'B', 'C', 'A'])
unique_grades = np.unique(grades)
print("Unique grades:", unique_grades)  # ['A' 'B' 'C']

# Count occurrences
unique_grades, counts = np.unique(grades, return_counts=True)
print("Grade counts:")
for grade, count in zip(unique_grades, counts):
    print(f"Grade {grade}: {count} students")

# Real example: Survey responses
responses = np.array([5, 4, 5, 3, 4, 5, 2, 4, 5, 3, 4, 5])
unique_responses, counts = np.unique(responses, return_counts=True)

print("Survey Results:")
for rating, count in zip(unique_responses, counts):
    print(f"Rating {rating}: {count} responses")

Conditional Operations #

# Where function
scores = np.array([85, 92, 76, 88, 91, 78, 95, 82])

# Replace scores: A (>=90), B (80-89), C (<80)
letter_grades = np.where(scores >= 90, 'A',
                        np.where(scores >= 80, 'B', 'C'))
print("Letter grades:", letter_grades)

# Real example: Temperature classification
temperatures = np.array([22, 35, 28, 41, 19, 33, 25, 38])
weather = np.where(temperatures > 35, 'Hot',
                  np.where(temperatures > 25, 'Warm', 'Cool'))
print("Weather conditions:", weather)

# Count conditions
hot_days = np.sum(temperatures > 35)
warm_days = np.sum((temperatures > 25) & (temperatures <= 35))
cool_days = np.sum(temperatures <= 25)

print(f"Hot days: {hot_days}, Warm days: {warm_days}, Cool days: {cool_days}")

9. Linear Algebra with NumPy #

Matrix Operations #

# Matrix multiplication
A = np.array([[1, 2],
              [3, 4]])
B = np.array([[5, 6],
              [7, 8]])

# Dot product (matrix multiplication)
result = np.dot(A, B)
# or result = A @ B
print("Matrix multiplication:\n", result)
"""
[[19 22]
 [43 50]]
"""

# Transpose
print("Transpose of A:\n", A.T)
"""
[[1 3]
 [2 4]]
"""

# Determinant
det_A = np.linalg.det(A)
print("Determinant of A:", det_A)  # -2.0

# Inverse
inv_A = np.linalg.inv(A)
print("Inverse of A:\n", inv_A)

# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors:\n", eigenvectors)

Solving Linear Systems #

# Solve system: 2x + 3y = 7, x - y = 1
# In matrix form: AX = B
A = np.array([[2, 3],
              [1, -1]])
B = np.array([7, 1])

# Solve for X
X = np.linalg.solve(A, B)
print("Solution:", X)  # [2. 1.] means x=2, y=1

# Verify solution
print("Verification:", np.dot(A, X))  # Should equal B

10. Working with Real Data #

Statistical Analysis #

# Simulate sales data for 12 months, 3 products
np.random.seed(42)  # For reproducible results
sales_data = np.random.normal(1000, 200, (12, 3))  # mean=1000, std=200

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
          'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
products = ['Product A', 'Product B', 'Product C']

# Monthly analysis
monthly_totals = np.sum(sales_data, axis=1)  # Sum across products
best_month_idx = np.argmax(monthly_totals)
worst_month_idx = np.argmin(monthly_totals)

print(f"Best month: {months[best_month_idx]} (${monthly_totals[best_month_idx]:.0f})")
print(f"Worst month: {months[worst_month_idx]} (${monthly_totals[worst_month_idx]:.0f})")

# Product analysis
product_totals = np.sum(sales_data, axis=0)  # Sum across months
best_product_idx = np.argmax(product_totals)

print(f"Best product: {products[best_product_idx]} (${product_totals[best_product_idx]:.0f})")

# Growth analysis
monthly_growth = np.diff(monthly_totals) / monthly_totals[:-1] * 100
avg_growth = np.mean(monthly_growth)
print(f"Average monthly growth: {avg_growth:.1f}%")

Data Cleaning and Processing #

# Simulate sensor data with some missing/invalid values
sensor_data = np.array([22.5, 23.1, -999, 24.2, 23.8, 25.1, -999, 24.5, 23.9])

# Clean data: replace -999 (invalid readings) with NaN
cleaned_data = np.where(sensor_data == -999, np.nan, sensor_data)

# Remove NaN values for calculations
valid_data = cleaned_data[~np.isnan(cleaned_data)]

print(f"Original data points: {len(sensor_data)}")
print(f"Valid data points: {len(valid_data)}")
print(f"Average temperature: {np.mean(valid_data):.1f}°C")
print(f"Temperature range: {np.max(valid_data) - np.min(valid_data):.1f}°C")

# Fill missing values with interpolation (simple average of neighbors)
for i in range(len(cleaned_data)):
    if np.isnan(cleaned_data[i]):
        # Use average of previous and next valid values
        prev_val = cleaned_data[i-1] if i > 0 and not np.isnan(cleaned_data[i-1]) else np.mean(valid_data)
        next_val = cleaned_data[i+1] if i < len(cleaned_data)-1 and not np.isnan(cleaned_data[i+1]) else np.mean(valid_data)
        cleaned_data[i] = (prev_val + next_val) / 2

print("Cleaned data:", cleaned_data)

11. Performance Tips and Best Practices #

Memory Efficiency #

# Use appropriate data types
small_integers = np.array([1, 2, 3, 4, 5], dtype=np.int8)  # 1 byte per element
large_integers = np.array([1, 2, 3, 4, 5], dtype=np.int64)  # 8 bytes per element

print(f"int8 array uses: {small_integers.nbytes} bytes")
print(f"int64 array uses: {large_integers.nbytes} bytes")

# Use views instead of copies when possible
original = np.arange(1000000)
view = original[::2]  # Every 2nd element (creates view, not copy)
copy = original[::2].copy()  # Creates actual copy

print("View shares memory with original:", np.shares_memory(original, view))
print("Copy shares memory with original:", np.shares_memory(original, copy))

Vectorization vs Loops #

import time

# Bad: Using Python loops
def slow_calculation(arr):
    result = []
    for x in arr:
        result.append(x**2 + 2*x + 1)
    return np.array(result)

# Good: Using NumPy vectorization
def fast_calculation(arr):
    return arr**2 + 2*arr + 1

# Test performance
large_array = np.random.random(100000)

# Time the slow version
start = time.time()
slow_result = slow_calculation(large_array)
slow_time = time.time() - start

# Time the fast version
start = time.time()
fast_result = fast_calculation(large_array)
fast_time = time.time() - start

print(f"Slow version: {slow_time:.4f} seconds")
print(f"Fast version: {fast_time:.4f} seconds")
print(f"Speedup: {slow_time/fast_time:.1f}x faster")

12. Common Patterns and Recipes #

Moving Averages #

def moving_average(data, window_size):
    """Calculate moving average with given window size"""
    return np.convolve(data, np.ones(window_size)/window_size, mode='valid')

# Stock price data (simulated)
stock_prices = np.array([100, 102, 98, 105, 103, 107, 109, 106, 108, 110, 112, 108])

# 3-day moving average
ma_3 = moving_average(stock_prices, 3)
print("3-day moving average:", ma_3.round(2))

# 5-day moving average
ma_5 = moving_average(stock_prices, 5)
print("5-day moving average:", ma_5.round(2))

Normalization and Standardization #

# Sample test scores
scores = np.array([78, 85, 92, 88, 76, 91, 83, 87, 90, 79])

# Min-Max normalization (0 to 1)
normalized = (scores - np.min(scores)) / (np.max(scores) - np.min(scores))
print("Normalized scores:", normalized.round(3))

# Z-score standardization (mean=0, std=1)
standardized = (scores - np.mean(scores)) / np.std(scores)
print("Standardized scores:", standardized.round(3))

# Check results
print(f"Standardized mean: {np.mean(standardized):.6f}")  # Should be ~0
print(f"Standardized std: {np.std(standardized):.6f}")    # Should be ~1

Finding Peaks and Valleys #

def find_peaks(data, threshold=0):
    """Find local maxima in data"""
    peaks = []
    for i in range(1, len(data)-1):
        if data[i] > data[i-1] and data[i] > data[i+1] and data[i] > threshold:
            peaks.append(i)
    return np.array(peaks)

# Sample signal data
signal = np.array([1, 3, 2, 5, 4, 6, 3, 8, 2, 4, 1])
peak_indices = find_peaks(signal)
peak_values = signal[peak_indices]

print("Peak indices:", peak_indices)
print("Peak values:", peak_values)

13. Integration with Other Libraries #

With Pandas #

# Convert between NumPy and Pandas
grades_array = np.array([[85, 90, 78],
                        [92, 88, 91],
                        [76, 82, 89]])

# If you have pandas installed:
# import pandas as pd
# grades_df = pd.DataFrame(grades_array, 
#                         columns=['Math', 'Science', 'English'],
#                         index=['Alice', 'Bob', 'Charlie'])
# 
# # Convert back to NumPy
# back_to_numpy = grades_df.values

With Matplotlib (Visualization) #

# If you have matplotlib installed:
# import matplotlib.pyplot as plt
# 
# # Generate sample data
# x = np.linspace(0, 10, 100)
# y = np.sin(x)
# 
# # Create plot
# plt.figure(figsize=(10, 6))
# plt.plot(x, y)
# plt.title('Sine Wave')
# plt.xlabel('x')
# plt.ylabel('sin(x)')
# plt.grid(True)
# plt.show()

14. Common Errors and Solutions #

Shape Mismatch Errors #

# Common error: Shape mismatch
try:
    a = np.array([[1, 2], [3, 4]])     # (2, 2)
    b = np.array([1, 2, 3])            # (3,)
    result = a + b  # This will fail
except ValueError as e:
    print("Error:", e)
    print("Solution: Make sure shapes are compatible for broadcasting")
    
    # Fix: Reshape or use compatible arrays
    b_fixed = np.array([1, 2])  # (2,) - compatible with (2, 2)
    result = a + b_fixed
    print("Fixed result:\n", result)

Index Out of Bounds #

arr = np.array([1, 2, 3, 4, 5])

# Safe indexing
def safe_index(array, index):
    if 0 <= index < len(array):
        return array[index]
    else:
        print(f"Index {index} is out of bounds for array of length {len(array)}")
        return None

# Test safe indexing
print(safe_index(arr, 2))   # 3 (valid)
print(safe_index(arr, 10))  # None (invalid)

Data Type Issues #

# Integer overflow
small_int = np.array([100], dtype=np.int8)  # Range: -128 to 127
try:
    result = small_int * 2  # 200 > 127, causes overflow
    print("Overflow result:", result)  # Unexpected result!
except:
    print("Use larger data type for large numbers")

# Solution: Use appropriate data type
large_int = np.array([100], dtype=np.int32)
result = large_int * 2
print("Correct result:", result)  # [200]

# Division by zero
arr = np.array([1, 2, 0, 4])
result = np.divide(10, arr, out=np.zeros_like(arr, dtype=float), where=(arr!=0))
print("Safe division:", result)  # [10.  5.  0.  2.5]

15. Advanced Topics #

Memory Layout and Performance #

# Row-major vs Column-major order
arr_c = np.array([[1, 2, 3], [4, 5, 6]], order='C')  # C-style (row-major)
arr_f = np.array([[1, 2, 3], [4, 5, 6]], order='F')  # Fortran-style (column-major)

print("C-order flags:", arr_c.flags)
print("F-order flags:", arr_f.flags)

# Performance difference for different access patterns
import time

large_matrix = np.random.random((1000, 1000))

# Row-wise access (efficient for C-order)
start = time.time()
for i in range(1000):
    row_sum = np.sum(large_matrix[i, :])
row_time = time.time() - start

# Column-wise access
start = time.time()
for j in range(1000):
    col_sum = np.sum(large_matrix[:, j])
col_time = time.time() - start

print(f"Row-wise access: {row_time:.4f}s")
print(f"Column-wise access: {col_time:.4f}s")

Custom Data Types #

# Define structured array (like a database record)
student_dtype = np.dtype([
    ('name', 'U20'),        # Unicode string, max 20 chars
    ('age', 'i4'),          # 32-bit integer
    ('grades', 'f4', (3,)), # Array of 3 32-bit floats
    ('passed', '?')         # Boolean
])

# Create structured array
students = np.array([
    ('Alice', 20, [85.5, 90.0, 78.5], True),
    ('Bob', 19, [76.0, 82.5, 88.0], True),
    ('Charlie', 21, [65.0, 70.5, 72.0], False)
], dtype=student_dtype)

print("Student names:", students['name'])
print("Average grades:", np.mean(students['grades'], axis=1))
print("Passed students:", students[students['passed']]['name'])

Advanced Indexing #

# Fancy indexing with arrays
arr = np.array([10, 20, 30, 40, 50])
indices = np.array([0, 2, 4])
selected = arr[indices]
print("Selected elements:", selected)  # [10 30 50]

# 2D fancy indexing
matrix = np.array([[1, 2, 3, 4],
                   [5, 6, 7, 8],
                   [9, 10, 11, 12]])

# Select specific elements
rows = np.array([0, 1, 2])
cols = np.array([1, 2, 3])
diagonal_like = matrix[rows, cols]
print("Selected elements:", diagonal_like)  # [2 7 12]

# Boolean indexing with multiple conditions
data = np.random.randint(1, 100, 20)
complex_condition = (data > 20) & (data < 80) & (data % 2 == 0)
filtered_data = data[complex_condition]
print("Filtered data:", filtered_data)

16. Practical Projects #

Project 1: Grade Analysis System #

class GradeAnalyzer:
    def __init__(self, grades, student_names, subject_names):
        self.grades = np.array(grades)
        self.students = np.array(student_names)
        self.subjects = np.array(subject_names)
    
    def student_averages(self):
        """Calculate average grade for each student"""
        return np.mean(self.grades, axis=1)
    
    def subject_averages(self):
        """Calculate average grade for each subject"""
        return np.mean(self.grades, axis=0)
    
    def top_students(self, n=3):
        """Get top n students by average"""
        averages = self.student_averages()
        top_indices = np.argsort(averages)[-n:][::-1]
        return self.students[top_indices], averages[top_indices]
    
    def failing_students(self, threshold=60):
        """Find students with any grade below threshold"""
        failing_mask = np.any(self.grades < threshold, axis=1)
        return self.students[failing_mask]
    
    def grade_distribution(self):
        """Analyze grade distribution"""
        flat_grades = self.grades.flatten()
        a_grades = np.sum(flat_grades >= 90)
        b_grades = np.sum((flat_grades >= 80) & (flat_grades < 90))
        c_grades = np.sum((flat_grades >= 70) & (flat_grades < 80))
        d_grades = np.sum((flat_grades >= 60) & (flat_grades < 70))
        f_grades = np.sum(flat_grades < 60)
        
        return {
            'A': a_grades, 'B': b_grades, 'C': c_grades,
            'D': d_grades, 'F': f_grades
        }

# Example usage
grades_data = [
    [85, 90, 78, 92],  # Alice
    [76, 82, 88, 85],  # Bob
    [92, 95, 89, 94],  # Charlie
    [68, 72, 75, 70],  # Diana
    [95, 98, 92, 96]   # Eve
]

analyzer = GradeAnalyzer(
    grades_data,
    ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    ['Math', 'Science', 'English', 'History']
)

print("Student Averages:")
for student, avg in zip(analyzer.students, analyzer.student_averages()):
    print(f"{student}: {avg:.1f}")

print("\nTop 3 Students:")
top_students, top_scores = analyzer.top_students(3)
for student, score in zip(top_students, top_scores):
    print(f"{student}: {score:.1f}")

print("\nGrade Distribution:", analyzer.grade_distribution())

Project 2: Weather Data Analysis #

class WeatherAnalyzer:
    def __init__(self, temperatures, dates):
        self.temperatures = np.array(temperatures)
        self.dates = np.array(dates)
    
    def temperature_stats(self):
        """Basic temperature statistics"""
        return {
            'mean': np.mean(self.temperatures),
            'median': np.median(self.temperatures),
            'std': np.std(self.temperatures),
            'min': np.min(self.temperatures),
            'max': np.max(self.temperatures),
            'range': np.max(self.temperatures) - np.min(self.temperatures)
        }
    
    def find_extremes(self):
        """Find hottest and coldest days"""
        hot_day_idx = np.argmax(self.temperatures)
        cold_day_idx = np.argmin(self.temperatures)
        
        return {
            'hottest_day': self.dates[hot_day_idx],
            'hottest_temp': self.temperatures[hot_day_idx],
            'coldest_day': self.dates[cold_day_idx],
            'coldest_temp': self.temperatures[cold_day_idx]
        }
    
    def temperature_trends(self, window=7):
        """Calculate moving average trends"""
        if len(self.temperatures) < window:
            return None
        
        moving_avg = np.convolve(
            self.temperatures, 
            np.ones(window)/window, 
            mode='valid'
        )
        
        # Calculate trend (positive = warming, negative = cooling)
        trend = np.polyfit(range(len(moving_avg)), moving_avg, 1)[0]
        
        return {
            'moving_average': moving_avg,
            'trend': trend,
            'trend_description': 'Warming' if trend > 0 else 'Cooling'
        }
    
    def heat_wave_analysis(self, threshold=30, min_duration=3):
        """Detect heat waves (consecutive days above threshold)"""
        hot_days = self.temperatures > threshold
        heat_waves = []
        
        i = 0
        while i < len(hot_days):
            if hot_days[i]:
                start = i
                while i < len(hot_days) and hot_days[i]:
                    i += 1
                duration = i - start
                
                if duration >= min_duration:
                    heat_waves.append({
                        'start_date': self.dates[start],
                        'end_date': self.dates[i-1],
                        'duration': duration,
                        'max_temp': np.max(self.temperatures[start:i]),
                        'avg_temp': np.mean(self.temperatures[start:i])
                    })
            else:
                i += 1
        
        return heat_waves

# Example usage
# Generate sample weather data for 30 days
np.random.seed(42)
base_temp = 25
seasonal_variation = 5 * np.sin(np.linspace(0, 2*np.pi, 30))
daily_variation = np.random.normal(0, 3, 30)
temperatures = base_temp + seasonal_variation + daily_variation

dates = [f"2024-06-{day:02d}" for day in range(1, 31)]

weather = WeatherAnalyzer(temperatures, dates)

print("Temperature Statistics:")
stats = weather.temperature_stats()
for key, value in stats.items():
    print(f"{key.title()}: {value:.1f}°C")

print("\nExtreme Days:")
extremes = weather.find_extremes()
print(f"Hottest: {extremes['hottest_day']} ({extremes['hottest_temp']:.1f}°C)")
print(f"Coldest: {extremes['coldest_day']} ({extremes['coldest_temp']:.1f}°C)")

print("\nTrend Analysis:")
trends = weather.temperature_trends()
if trends:
    print(f"Overall trend: {trends['trend_description']} ({trends['trend']:.2f}°C/day)")

print("\nHeat Wave Analysis:")
heat_waves = weather.heat_wave_analysis(28, 2)
for i, wave in enumerate(heat_waves):
    print(f"Heat Wave {i+1}: {wave['start_date']} to {wave['end_date']}")
    print(f"  Duration: {wave['duration']} days, Max: {wave['max_temp']:.1f}°C")

Project 3: Financial Portfolio Analysis #

class PortfolioAnalyzer:
    def __init__(self, prices, stock_names):
        self.prices = np.array(prices)  # Shape: (days, stocks)
        self.stocks = np.array(stock_names)
    
    def daily_returns(self):
        """Calculate daily returns for each stock"""
        return np.diff(self.prices, axis=0) / self.prices[:-1] * 100
    
    def volatility(self):
        """Calculate volatility (standard deviation of returns)"""
        returns = self.daily_returns()
        return np.std(returns, axis=0)
    
    def cumulative_returns(self):
        """Calculate cumulative returns from start"""
        return (self.prices / self.prices[0] - 1) * 100
    
    def correlation_matrix(self):
        """Calculate correlation between stocks"""
        returns = self.daily_returns()
        return np.corrcoef(returns.T)
    
    def portfolio_performance(self, weights):
        """Calculate portfolio performance with given weights"""
        weights = np.array(weights)
        if not np.isclose(np.sum(weights), 1.0):
            raise ValueError("Weights must sum to 1.0")
        
        returns = self.daily_returns()
        portfolio_returns = np.dot(returns, weights)
        
        return {
            'daily_returns': portfolio_returns,
            'total_return': np.sum(portfolio_returns),
            'volatility': np.std(portfolio_returns),
            'sharpe_ratio': np.mean(portfolio_returns) / np.std(portfolio_returns) if np.std(portfolio_returns) > 0 else 0
        }
    
    def risk_metrics(self):
        """Calculate risk metrics for each stock"""
        returns = self.daily_returns()
        
        # Value at Risk (95% confidence)
        var_95 = np.percentile(returns, 5, axis=0)
        
        # Maximum drawdown
        cumulative = self.cumulative_returns()
        peak = np.maximum.accumulate(cumulative, axis=0)
        drawdown = (cumulative - peak)
        max_drawdown = np.min(drawdown, axis=0)
        
        return {
            'var_95': var_95,
            'max_drawdown': max_drawdown,
            'volatility': self.volatility()
        }

# Example usage
# Simulate stock prices for 100 days, 4 stocks
np.random.seed(42)
days = 100
stocks = 4
initial_prices = [100, 50, 200, 150]

# Generate realistic price movements
price_data = []
current_prices = np.array(initial_prices)

for day in range(days):
    # Random daily returns (mean-reverting with some trend)
    daily_changes = np.random.normal([0.05, 0.03, 0.02, 0.04], [2, 1.5, 3, 2.5])
    current_prices = current_prices * (1 + daily_changes/100)
    price_data.append(current_prices.copy())

portfolio = PortfolioAnalyzer(
    price_data,
    ['TECH', 'FINANCE', 'HEALTH', 'ENERGY']
)

print("Stock Performance Summary:")
print("-" * 50)
for i, stock in enumerate(portfolio.stocks):
    final_return = portfolio.cumulative_returns()[-1, i]
    volatility = portfolio.volatility()[i]
    print(f"{stock:8}: Return: {final_return:6.1f}%, Volatility: {volatility:.1f}%")

print("\nCorrelation Matrix:")
print("-" * 30)
corr_matrix = portfolio.correlation_matrix()
print("         ", end="")
for stock in portfolio.stocks:
    print(f"{stock:8}", end="")
print()
for i, stock in enumerate(portfolio.stocks):
    print(f"{stock:8}", end=" ")
    for j in range(len(portfolio.stocks)):
        print(f"{corr_matrix[i,j]:7.2f}", end=" ")
    print()

# Test different portfolio allocations
equal_weights = [0.25, 0.25, 0.25, 0.25]
tech_heavy = [0.5, 0.2, 0.2, 0.1]
conservative = [0.1, 0.4, 0.4, 0.1]

print("\nPortfolio Comparison:")
print("-" * 40)
for name, weights in [("Equal Weight", equal_weights), 
                      ("Tech Heavy", tech_heavy), 
                      ("Conservative", conservative)]:
    perf = portfolio.portfolio_performance(weights)
    print(f"{name:12}: Return: {perf['total_return']:6.1f}%, "
          f"Volatility: {perf['volatility']:.1f}%, "
          f"Sharpe: {perf['sharpe_ratio']:.2f}")

17. Best Practices Summary #

Do’s ✅ #

# 1. Use vectorized operations instead of loops
good_way = np.sum(arr**2)  # Fast
# bad_way = sum([x**2 for x in arr])  # Slow

# 2. Use appropriate data types
efficient_int = np.array([1, 2, 3], dtype=np.int32)  # 4 bytes per element
# wasteful_int = np.array([1, 2, 3], dtype=np.float64)  # 8 bytes per element

# 3. Use broadcasting for different shaped arrays
matrix = np.ones((100, 3))
row_vector = np.array([1, 2, 3])
result = matrix * row_vector  # Broadcasting works

# 4. Preallocate arrays when size is known
result = np.zeros(1000)  # Good
# result = []  # Bad for numerical work

# 5. Use views instead of copies when possible
view = arr[::2]  # Creates view (fast, memory efficient)
# copy = arr[::2].copy()  # Creates copy (slower, more memory)

Don’ts ❌ #

# 1. Don't modify arrays while iterating
arr = np.array([1, 2, 3, 4, 5])
# Don't do this:
# for i in range(len(arr)):
#     if arr[i] > 3:
#         arr = np.delete(arr, i)  # Modifies array during iteration

# Do this instead:
arr = arr[arr <= 3]  # Use boolean indexing

# 2. Don't use nested loops for array operations
# Bad:
# result = np.zeros_like(matrix)
# for i in range(matrix.shape[0]):
#     for j in range(matrix.shape[1]):
#         result[i, j] = matrix[i, j] ** 2

# Good:
result = matrix ** 2

# 3. Don't forget to handle edge cases
def safe_divide(a, b):
    return np.divide(a, b, out=np.zeros_like(a), where=(b!=0))

# 4. Don't ignore memory layout for performance-critical code
# Be aware of C-order vs F-order for large arrays

18. Conclusion #

NumPy is the foundation of scientific computing in Python. It provides:

Key Benefits:

  • Performance: 50-100x faster than pure Python
  • Memory Efficiency: Compact data storage
  • Functionality: Rich mathematical and statistical functions
  • Ecosystem: Works seamlessly with pandas, matplotlib, scikit-learn

When to Use NumPy:

  • Mathematical computations
  • Data analysis and manipulation
  • Scientific computing
  • Machine learning preprocessing
  • Image and signal processing
  • Financial analysis

Learning Path:

  1. Beginner: Arrays, indexing, basic operations
  2. Intermediate: Broadcasting, reshaping, statistical functions
  3. Advanced: Linear algebra, custom dtypes, performance optimization

Next Steps:

  • Pandas: For structured data analysis
  • Matplotlib: For data visualization
  • Scikit-learn: For machine learning
  • SciPy: For advanced scientific computing

NumPy is like learning to use a powerful calculator – once you master it, you’ll wonder how you ever did numerical work without it!

Remember: Always think in arrays, not loops! This mindset shift will make you a much more effective Python programmer for numerical tasks.

Updated on June 9, 2025