Diffprivlib v0.6

Python versions Downloads PyPi version PyPi status General tests Documentation Status CodeQL codecov

Diffprivlib is a general-purpose library for experimenting with, investigating and developing applications in, differential privacy.

Use diffprivlib if you are looking to:

  • Experiment with differential privacy
  • Explore the impact of differential privacy on machine learning accuracy using classification and clustering models
  • Build your own differential privacy applications, using our extensive collection of mechanisms

Diffprivlib supports Python versions 3.8 to 3.11.

We're using the Iris dataset, so let's load it and perform an 80/20 train/test split.

from sklearn import datasets
from sklearn.model_selection import train_test_split

dataset = datasets.load_iris()
X_train, X_test, y_train, y_test = train_test_split(dataset.data, dataset.target, test_size=0.2)
sklearn
diffprivlib.models.GaussianNBboundsepsilonGaussianNB(epsilon=0.1)epsilon = 1.0
from diffprivlib.models import GaussianNB

clf = GaussianNB()
clf.fit(X_train, y_train)

We can now classify unseen examples, knowing that the trained model is differentially private and preserves the privacy of the 'individuals' in the training set (flowers are entitled to their privacy too!).

clf.predict(X_test)
.fit()
print("Test accuracy: %f" % clf.score(X_test, y_test))
epsilonmatplotlib
import numpy as np
import matplotlib.pyplot as plt

epsilons = np.logspace(-2, 2, 50)
bounds = ([4.3, 2.0, 1.1, 0.1], [7.9, 4.4, 6.9, 2.5])
accuracy = list()

for epsilon in epsilons:
    clf = GaussianNB(bounds=bounds, epsilon=epsilon)
    clf.fit(X_train, y_train)
    
    accuracy.append(clf.score(X_test, y_test))

plt.semilogx(epsilons, accuracy)
plt.title("Differentially private Naive Bayes accuracy")
plt.xlabel("epsilon")
plt.ylabel("Accuracy")
plt.show()

Differentially private naive Bayes

Congratulations, you've completed your first differentially private machine learning task with the Differential Privacy Library! Check out more examples in the notebooks directory, or dive straight in.

Contents

Diffprivlib is comprised of four major components:

BudgetAccountant

Setup

pip
pippip3
pip install diffprivlib

Manual installation

For the most recent version of the library, either download the source code or clone the repository in your directory of choice:

git clone https://github.com/IBM/differential-privacy-library
diffprivlibpython3 -m pip install .
pip install .
pytestpytest
pytest

Citing diffprivlib

If you use diffprivlib for research, please consider citing the following reference paper:

@article{diffprivlib,
  title={Diffprivlib: the {IBM} differential privacy library},
  author={Holohan, Naoise and Braghin, Stefano and Mac Aonghusa, P{\'o}l and Levacher, Killian},
  year={2019},
  journal = {ArXiv e-prints},
  archivePrefix = "arXiv",
  volume = {1907.02444 [cs.CR]},
  primaryClass = "cs.CR",
  month = jul
}

References

  • Holohan, N., Antonatos, S., Braghin, S. and Mac Aonghusa, P., 2018. The Bounded Laplace Mechanism in Differential privacy. arXiv preprint arXiv:1808.10410.
  • Holohan, N., Braghin, S., Mac Aonghusa, P. and Levacher, K., 2019. Diffprivlib: the IBM Differential Privacy Library. arXiv preprint arXiv:1907.02444.
  • Ludwig, H., Baracaldo, N., Thomas, G., Zhou, Y., Anwar, A., Rajamoni, S., Ong, Y., Radhakrishnan, J., Verma, A., Sinn, M. and Purcell, M., 2020. IBM Federated Learning: an Enterprise Framework White Paper v0. 1. arXiv preprint arXiv:2007.10987.
  • Holohan, N. and Braghin, S., 2021. Secure Random Sampling in Differential Privacy. In Computer Security–ESORICS 2021: 26th European Symposium on Research in Computer Security, Darmstadt, Germany, October 4–8, 2021, Proceedings, Part II 26 (pp. 523-542). Springer International Publishing.