AutoFHE: Automated Adaption of CNNs for Efficient Evaluation over FHE

PhD Student

Wei Ao

Professor

Vishnu Boddeti

Toronto, Canada 2024

FHE.org Conference

Michigan State University

Secure deep learning under fully homomorprhic encryption

Deep Learning as a Service (DLaaS)

Internet

Prediction

Customer

Prediction

Prediction

Cloud

Secure DLaaS under Fully Homomorphic Encryption (FHE)

Prediction

Prediction

Customer

Cloud

Internet

1

Initiative: privacy homomorphisms, 1978

2

Gentry's Blueprint uses Ideal lattices, 2009

3

Approximate Arithmetic under leveled homomorphic encryption: CKKS (HEAAN),  ASIACRYPT 2017

4

Bootstrapping for CKKS: fully homomorphic encryption,  EUROCRYPT 2018

6

AutoFHE:  Automated Adaption of CNNs for Efficient Evaluation  over FHE, USENIX 2024

5

MPCNN: Deep CNNs on RNS-CKKS FHE scheme, ICML 2022

From Secure Computation to Secure Deep Learning

Convolutional Neural Networks (CNNs)

Prediction

Convolution

BatchNorm

ReLU

Convolution

BatchNorm

ReLU

Block

Convolution

BatchNorm

ReLU

AvgPool

Fully Connected

\times N

Prediction

AvgPool

Fully Connected

\times N

CNNs under Homomorphic Encryption (HE)

  • Multiplication
  • Addition
  • Rotation

Convolution

BatchNorm

ReLU

Block

BatchNorm

Convolution

ReLU

BatchNorm

Convolution

ReLU

Polynomial

\color{red}{\text{ReLU}(x)=\max(x, 0)}

Polynomial

Polynomial

\color{green}{\text{Polynomial}(x) = \sum_i a_i x^i}
Approximate

Convolution

BatchNorm

BatchNorm

Convolution

Polynomial

Polynomial

Deep CNNs under Fully Homomorphic Encryption (FHE)

Level: 
number of multiplications allowed to evaluate

Level 

L

Level 

0

Level 

L-K
  • High Latency
  • High Memory Footprint
\vdots

Bootstrapping

\vdots
\vdots

Bootstrapping

  • Security Requirement
  • Prediction Accuracy
  • Inference Latency

Deep CNNs under Fully Homomorphic Encryption (FHE)

Cryptographic Parameters

  • Security Requirement
  • Prediction Accuracy
  • Inference Latency

Cyclotomic polynomial degree

N

Level

L

Modulus

Q_\ell=\prod_{i=0}^\ell q_\ell, 0\leq\ell\leq L

Bootstrapping depth

K
h

Hamming weight

Deep CNNs under Fully Homomorphic Encryption (FHE)

Conv, BN, pooling,  FC layers: packing

Polynomials: degree -> depth

Number of layers: ResNet20, ResNet32

Channels/kernels

Input image resolution

Polynomial CNNS

  • Security Requirement
  • Prediction Accuracy
  • Inference Latency

Cryptographic Parameters

Cyclotomic polynomial degree

N

Level

L

Modulus

Q_\ell=\prod_{i=0}^\ell q_\ell, 0\leq\ell\leq L

Bootstrapping depth

K
h

Hamming weight

Deep CNNs under Fully Homomorphic Encryption (FHE)

Conv, BN, pooling,  FC layers: packing

Polynomials: degree -> depth

Number of layers: ResNet20, ResNet32

Channels/kernels

Input image resolution

Polynomial CNNS

  • Security Requirement
  • Prediction Accuracy
  • Inference Latency

Cryptographic Parameters

Cyclotomic polynomial degree

N

Level

L

Modulus

Q_\ell=\prod_{i=0}^\ell q_\ell, 0\leq\ell\leq L

Bootstrapping depth

K
h

Hamming weight

Deep CNNs under Fully Homomorphic Encryption (FHE)

Hand-crafted Design of Polynomial for CNNs under FHE

N, L, Q_\ell=\prod_{i=0}^\ell q_\ell (0\leq\ell\leq L), K, h

Cryptographic Parameters

Polynomials: degree -> depth

Polynomial CNNS

MPCNN [1]:

[1] Lee, Eunsang, et al. "Low-complexity deep convolutional neural networks on fully homomorphic encryption using multiplexed parallel convolutions." International Conference on Machine Learning. PMLR, 2022.

  • Fixed Architecture
  • Periodic Bootstrapping

Convolution

BatchNorm

Bootstrapping

High-degree

Polynomial

Level 

2

Level 

0

Level 

0

Level 

16

Level 

2

Hand-crafted Design of Polynomial for CNNs under FHE

High-degree Polynomial

Hand-crafted Design of Polynomial for CNNs under FHE

High-degree Polynomial

Low-degree Polynomial

Hand-crafted Design of Polynomial for CNNs under FHE

High-degree Polynomial

Low-degree Polynomial

Polynomial Neural Architectures 

How to obtain all possible polynomial neural architectures?
Key Insight

Optimize the

end-to-end polynomial neural architecture

instead of the polynomial function

Optimization of End-to-End Polynomial Neural Architecture

Poly

BN

Conv

BN

Conv

Poly

Poly

BN

Conv

BN

Conv

Poly

Poly

BN

Conv

BN

Conv

Poly

\vdots

Poly 1

Poly 2

Poly 3

Poly 4

Poly 5

Poly 6

Boot

Boot

Boot

Boot

Boot

Boot

Boot

Boot

Boot

Boot

Boot

Boot

Optimization of End-to-End Polynomial Neural Architecture

  • I want a faster response
  • I can wait for an accurate result

To meet different requirements in real world

AutoFHE: Automated Adaption of CNNs under FHE

Non-Arithmetic Neural Network

Polynomial Neural Nets

AutoFHE

EvoReLU: Evolutionary Mixed-Degree Polynomial Approximation of ReLU

[2] Lee, Eunsang, Joon-Woo Lee, Jong-Seon No, and Young-Sik Kim. "Minimax approximation of sign function by composite polynomial for homomorphic comparison." IEEE Transactions on Dependable and Secure Computing 19, no. 6 (2021): 3711-3727.

Forward Propagation

\mathrm{EvoReLU}(x) = \begin{cases} x, &\;d=1\\ \alpha_2x^2 + \alpha_1 x + \alpha_0, &\;d=2\\ x\cdot \left( \mathcal{F}(x) + 0.5 \right), &\; d > 2 \\ \end{cases}
\mathcal{F}(x)=(f^{d_K}_K\circ \cdots \circ f^{d_k}_{k}\circ \cdots \circ f^{d_1}_1)(x), 1\leq k\leq K

High-degree composite polynomial [2]:

  • Pruning: DeepReDuce, SAFENet, Delphi
  • Quadratic: LoLa,  CryptoNets, HEMET
  • High-degree approximation: MPCNN

Differentiable Evolution

EvoReLU: Evolutionary Mixed-Degree Polynomial Approximation of ReLU

\frac{\partial \mathrm{EvoReLU}(x)}{\partial x} = \begin{cases} 1, &\;d=1\\ 2\alpha_2x + \alpha_1, &\;d=2\\ \partial \mathrm{ReLU}(x) / \partial x, &\; d > 2 \\ \end{cases}

Backward Propagation

  • Gradient
  • Gradient
  • Straight-through estimated gradient
  • Make training more stable
How to optimize end-to-end polynomial neural architecture?
Multi-Objective evolutionary optimization

Joint Search for Layerwise EvoReLU and Bootstrapping Operations

Poly

BN

Conv

BN

Conv

Poly

Poly

BN

Conv

BN

Conv

Poly

Poly

BN

Conv

BN

Conv

Poly

\vdots

EvoReLU 1

EvoReLU 2

EvoReLU 2

EvoReLU 4

EvoReLU 5

EvoReLU 6

Boot

Boot

Boot

Boot

Boot

Boot

Boot

Boot

Boot

Boot

Boot

Boot

Joint search problem

Multi-objective optimization

  • Flexible Architecture
  • On-demand Bootstrapping

Multi-Objective Optimization

  • Accuracy
  • Latency

Single Objective

  • Only generate a single solution
  • Hard to tune balancing weights
  • Not Pareto optimal
  • Multiple solutions on the Pareto front
  • No need to tune weights
  • Pareto optimal

Scalarization of Multiple Objectives

\alpha \cdot \text{Accuracy} + \beta \cdot \text{Latency}

Multi-Objective Optimization

\min\left\{1-\text{Accuracy}, \text{\#Bootstrapping}\right\}

Multi-Objective Optimization

Multi-Objective Optimization

\min\left\{1-\text{Accuracy}, \color{green}{\text{\#Bootstrapping}}\right\}

Multi-Objective Optimization

\min\left\{1-\text{Accuracy}, \color{red}{\text{Depth of polys}}\right\}

Bootstrapping

Polynomial

Level 

4

Depth

9

Drop 4 Levels

  • Not necessarily reduce bootstrapping operations
  • Directly reduce bootstrapping operations

Evolutionary Multi-Objective Optimization

population

crossover

mutation

crowding distance sorting

non-dominated sorting

Evolutionary Multi-Objective Optimization

#Layers

population size

x_1: \text{EvoReLU}_{1 1},\text{EvoReLU}_{1 2},\text{EvoReLU}_{1 3},\text{EvoReLU}_{1 4},\cdots
x_2: \text{EvoReLU}_{2 1},\text{EvoReLU}_{2 2},\text{EvoReLU}_{2 3},\text{EvoReLU}_{2 4},\cdots
x_3: \text{EvoReLU}_{3 1},\text{EvoReLU}_{3 2},\text{EvoReLU}_{3 3},\text{EvoReLU}_{3 4},\cdots
x_4: \text{EvoReLU}_{4 1},\text{EvoReLU}_{4 2},\text{EvoReLU}_{4 3},\text{EvoReLU}_{4 4},\cdots

population

crossover

mutation

crowding distance sorting

non-dominated sorting

Evolutionary Multi-Objective Optimization

x_1: \text{EvoReLU}_{1 1},\text{EvoReLU}_{1 2},\text{EvoReLU}_{1 3},\text{EvoReLU}_{1 4},\cdots
x_2: \text{EvoReLU}_{2 1},\text{EvoReLU}_{2 2},\text{EvoReLU}_{2 3},\text{EvoReLU}_{2 4},\cdots
x_1^\prime: \text{EvoReLU}_{\color{red} 2 1},\text{EvoReLU}_{1 2},\text{EvoReLU}_{\color{red} 2 3},\text{EvoReLU}_{1 4},\cdots
x_2^\prime: \text{EvoReLU}_{\color{red} 1 1},\text{EvoReLU}_{2 2},\text{EvoReLU}_{\color{red} 1 3},\text{EvoReLU}_{2 4},\cdots

population

crossover

mutation

crowding distance sorting

non-dominated sorting

Evolutionary Multi-Objective Optimization

x_1: \text{EvoReLU}_{1 1},\text{EvoReLU}_{1 2},\text{EvoReLU}_{1 3},\text{EvoReLU}_{1 4},\cdots
x_2: \text{EvoReLU}_{2 1},\text{EvoReLU}_{2 2},\text{EvoReLU}_{2 3},\text{EvoReLU}_{2 4},\cdots
x_3: \text{EvoReLU}_{3 1},\text{EvoReLU}_{3 2},\text{EvoReLU}_{3 3},\text{EvoReLU}_{3 4},\cdots
x_4: \text{EvoReLU}_{4 1},\text{EvoReLU}_{4 2},\text{EvoReLU}_{4 3},\text{EvoReLU}_{4 4},\cdots
x_1^\prime: \text{EvoReLU}_{1 1},\text{EvoReLU}_{1 2},{\color{red} \text{EvoReLU}_{1 3}^\prime},\text{EvoReLU}_{1 4},\cdots
x_2^\prime: \text{EvoReLU}_{2 1},\text{EvoReLU}_{2 2},{\color{red} \text{EvoReLU}_{2 3}^\prime},\text{EvoReLU}_{2 4},\cdots
x_3^\prime: {\color{red} \text{EvoReLU}_{3 1}^\prime},\text{EvoReLU}_{3 2},\text{EvoReLU}_{3 3},\text{EvoReLU}_{3 4},\cdots
x_4^\prime: \text{EvoReLU}_{4 1}, { \color{red} \text{EvoReLU}_{4 2}^\prime},\text{EvoReLU}_{4 3},\text{EvoReLU}_{4 4},\cdots

population

crossover

mutation

crowding distance sorting

non-dominated sorting

Evolutionary Multi-Objective Optimization

x_3~\text{dominates}~x_6, x_7, \text{and}~x_8 \\ \text{i.e.}~x_3~\text{is better than}~x_6, x_7, \text{and}~x_8

population

crossover

mutation

crowding distance sorting

non-dominated sorting

Evolutionary Multi-Objective Optimization

\text{crowding distance of}~x_3~\text{is}~ \frac{L_1+L_2}{2}

population

crossover

mutation

crowding distance sorting

non-dominated sorting

Evolutionary Multi-Objective Optimization

population

crossover

mutation

crowding distance sorting

non-dominated sorting

one generation

search budget

How to fine-tune polynomial CNNs?
Neural network adaption

trainable weights

  • Inherit representation learning ability
  • Adapt trainable weights to EvoReLU

Trainable Weight Adaption and Knowledge Transferring

BN

Conv

ReLU

BN

Conv

ReLU

ReLU Network

EvoReLU 1

BN

Conv

BN

Conv

EvoReLU 2

Polynomial Network

\mathcal{L}_{train} = (1 - \tau) \mathcal{L}_{CE}+\tau \mathcal{L}_{KL}

Fine-tuning objective

Experiments on encrypted CIFAR10 dataset under FHE

Dataset: CIFAR10

50,000 training images

10,000 test images

32x32 resolution, 10 classes

Hardware & Software

Amazon AWS, r5.24xlarge

96 CPUs, 768 GB RAM

Microsoft SEAL, 3.6

Experimental Setup

Latency and Accuracy Trade-offs under FHE

Approach MPCNN
Venue ICML22
Scheme CKKS
Polynomial high-degree
Layerwise no
Strategy approx
Arch manual

Latency and Accuracy Trade-offs under FHE

Approach MPCNN AESPA
Venue ICML22 arXiv22
Scheme CKKS CKKS
Polynomial high-degree low-degree
Layerwise no no
Strategy approx train
Arch manual manual

Latency and Accuracy Trade-offs under FHE

Approach MPCNN AESPA REDsec
Venue ICML22 arXiv22 NDSS23
Scheme CKKS CKKS TFHE
Polynomial high-degree low-degree n/a
Layerwise no no n/a
Strategy approx train train
Arch manual manual manual

Latency and Accuracy Trade-offs under FHE

Approach MPCNN AESPA REDsec AutoFHE
Venue ICML22 arXiv22 NDSS23 USENIX24
Scheme CKKS CKKS TFHE CKKS
Polynomial high-degree low-degree n/a mixed
Layerwise no no n/a yes
Strategy approx train train adapt
Arch manual manual manual search

Multiplicative Depth of Layerwise EvoReLU

Layerwise EvoReLU

Conclusion
  • Multi-objective optimization generates Pareto-effective solutions to meet different requirements
  • Joint optimization of layerwise EvoReLU and bootstrapping results in optimal polynomial neural architectures
  • Adapted neural networks can inherit representation learning ability from ReLU networks
AutoFHE optimizes end-to-end polynomial neural architecture

Paper, Code & Online Slides