LeNet5

Train unconstrained deep learning for CIFAR-10 classification using modified LeNet5 based on this PyTorch tutorial

Problem Description

We have a simple feed-forward network. The input is an image, which is fed through several layers to obtain the output. The logit output is used to decide the label of the input image. Below is a demo image of LeNet5: image.png

Modules Importing

Import all necessary modules and add PyGRANSO src folder to system path.

[1]:
import time
import torch
import sys
## Adding PyGRANSO directories. Should be modified by user
sys.path.append('/home/buyun/Documents/GitHub/PyGRANSO')
from pygranso.pygranso import pygranso
from pygranso.pygransoStruct import pygransoStruct
from pygranso.private.getNvar import getNvarTorch
import torch.nn as nn
import torchvision.transforms as transforms
import torch.nn.functional as F
import torchvision

Data Initialization

Specify torch device, neural network architecture, and generate data.

NOTE: please specify path for downloading data.

Use GPU for this problem. If no cuda device available, please set device = torch.device(‘cpu’)

[2]:
device = torch.device('cuda')

class Net(nn.Module):
     def __init__(self):
        super().__init__()
        # 3 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)  # 5*5 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

     def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square, you can specify with a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = torch.flatten(x, 1) # flatten all dimensions except the batch dimension
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# fix model parameters
torch.manual_seed(0)
model = Net().to(device=device, dtype=torch.double)

transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
batch_size = 1000
trainset = torchvision.datasets.CIFAR10(root='/home/buyun/Documents/GitHub/PyGRANSO/examples', train=True, download=True, transform=transform)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=False, num_workers=2)
# data_in
for i, data in enumerate(trainloader, 0):
    if i >= 1:
         break
    # get the inputs; data is a list of [inputs, labels]
    inputs, labels = data

# All the user-provided data (vector/matrix/tensor) must be in torch tensor format.
# As PyTorch tensor is single precision by default, one must explicitly set `dtype=torch.double`.
# Also, please make sure the device of provided torch tensor is the same as opts.torch_device.
labels = labels.to(device=device) # label/target [256]
inputs = inputs.to(device=device, dtype=torch.double) # input data [256,3,32,32]
Files already downloaded and verified

Function Set-Up

Encode the optimization variables, and objective and constraint functions.

Note: please strictly follow the format of comb_fn, which will be used in the PyGRANSO main algortihm.

[3]:
def user_fn(model,inputs,labels):
    # objective function
    outputs = model(inputs)
    criterion = nn.CrossEntropyLoss()
    f = criterion(outputs, labels)
    ci = None
    ce = None
    return [f,ci,ce]

comb_fn = lambda model : user_fn(model,inputs,labels)

User Options

Specify user-defined options for PyGRANSO

[4]:
opts = pygransoStruct()
opts.torch_device = device
nvar = getNvarTorch(model.parameters())
opts.x0 = torch.nn.utils.parameters_to_vector(model.parameters()).detach().reshape(nvar,1)
opts.opt_tol = 1e-3
# opts.fvalquit = 1e-6
opts.print_level = 1
opts.print_frequency = 10
# opts.print_ascii = True
opts.limited_mem_size = 100

Initial Test

Check initial accuracy of the modified LeNet5 model

[5]:
outputs = model(inputs )
acc = (outputs.max(1)[1] == labels).sum().item()/labels.size(0)

print("Initial acc = {}".format(acc))
Initial acc = 0.105
/home/buyun/anaconda3/envs/cuosqp_pygranso/lib/python3.9/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /opt/conda/conda-bld/pytorch_1623448255797/work/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)

Main Algorithm

[6]:
start = time.time()
soln = pygranso(var_spec= model, combined_fn = comb_fn, user_opts = opts)
end = time.time()
print("Total Wall Time: {}s".format(end - start))


╔═════ QP SOLVER NOTICE ════════════════════════════════════════════════════════════════════════╗
║  PyGRANSO requires a quadratic program (QP) solver that has a quadprog-compatible interface,  ║
║  the default is osqp. Users may provide their own wrapper for the QP solver.                  ║
║  To disable this notice, set opts.quadprog_info_msg = False                                   ║
╚═══════════════════════════════════════════════════════════════════════════════════════════════╝
══════════════════════════════════════════════════════════════════════════════════════════════╗
PyGRANSO: A PyTorch-enabled port of GRANSO with auto-differentiation                          ║
Version 1.2.0                                                                                 ║
Licensed under the AGPLv3, Copyright (C) 2021-2022 Tim Mitchell and Buyun Liang               ║
══════════════════════════════════════════════════════════════════════════════════════════════╣
Problem specifications:                                                                       ║
 # of variables                     :   62006                                                 ║
 # of inequality constraints        :       0                                                 ║
 # of equality constraints          :       0                                                 ║
══════════════════════════════════════════════════════════════════════════════════════════════╣
Limited-memory mode enabled with size = 100.                                                 NOTE: limited-memory mode is generally NOT                                                   recommended for nonsmooth problems.                                                           ║
═════╦════════════╦════════════════╦═════════════╦═══════════════════════╦════════════════════╣
     ║ Penalty Fn ║                ║  Violation  ║ <--- Line Search ---> ║ <- Stationarity -> ║
Iter ║ Mu │ Value ║    Objective   ║ Ineq │  Eq  ║ SD │ Evals │     t    ║ Grads │    Value   ║
═════╬════════════╬════════════════╬═════════════╬═══════════════════════╬════════════════════╣
   0 ║  - │   -   ║  2.30404643258 ║   -  │   -  ║ -  │     1 │ 0.000000 ║     1 │ 0.044542   ║
  10 ║  - │   -   ║  2.14604806758 ║   -  │   -  ║ QN │     8 │ 0.007812 ║     1 │ 15.27986   ║
  20 ║  - │   -   ║  1.83576254073 ║   -  │   -  ║ QN │     4 │ 0.125000 ║     1 │ 4.301531   ║
  30 ║  - │   -   ║  1.51636303311 ║   -  │   -  ║ QN │     4 │ 0.125000 ║     1 │ 5.412692   ║
  40 ║  - │   -   ║  1.15580906091 ║   -  │   -  ║ QN │     4 │ 0.125000 ║     1 │ 12.11675   ║
  50 ║  - │   -   ║  0.62375615977 ║   -  │   -  ║ QN │     5 │ 0.062500 ║     1 │ 7.328582   ║
  60 ║  - │   -   ║  0.13021076357 ║   -  │   -  ║ QN │     4 │ 0.125000 ║     1 │ 6.408397   ║
  70 ║  - │   -   ║  5.6679828e-04 ║   -  │   -  ║ QN │     1 │ 1.000000 ║     1 │ 0.059144   ║
═════╩════════════╩════════════════╩═════════════╩═══════════════════════╩════════════════════╣
Optimization results:                                                                         ║
F = final iterate, B = Best (to tolerance), MF = Most Feasible                                ║
═════╦════════════╦════════════════╦═════════════╦═══════════════════════╦════════════════════╣
   F ║    │       ║  5.2997040e-05 ║   -  │   -  ║    │       │          ║       │            ║
   B ║    │       ║  5.2997040e-05 ║   -  │   -  ║    │       │          ║       │            ║
═════╩════════════╩════════════════╩═════════════╩═══════════════════════╩════════════════════╣
Iterations:              73                                                                   ║
Function evaluations:    338                                                                  ║
PyGRANSO termination code: 0 --- converged to stationarity tolerance.                         ║
══════════════════════════════════════════════════════════════════════════════════════════════╝
Total Wall Time: 9.920972108840942s

Train Accuracy

[7]:
torch.nn.utils.vector_to_parameters(soln.final.x, model.parameters())
outputs = model(inputs)
acc = (outputs.max(1)[1] == labels).sum().item()/labels.size(0)
print("Train acc = {}".format(acc))
Train acc = 1.0