DWD Classifiers

class dwd.socp_dwd.DWD(C=1.0, solver_kws=None)

property direction

The separating hyperplane is of the form ‘p.d = d.i’, where ‘.’ is the dot product. If ‘p.d < d.i’, then ‘p’ is classified label 0. If ‘p.d > d.i’ then it is classified label 1.

Returns

direction (np.ndarray) – The DWD separating direction; normal to the hyperplane.
intercept (float) – The intercept of the separating hyperplane.
the DWD direction and intercept. The separating hyperplane is of the
form ‘p.d = d.i’, where ‘.’ is the dot product. If ‘p.d < d.i’, then ‘p’ is label
0. If ‘p.d > d.i’, then ‘p’ is label 1.

fit(X, y, sample_weight=None)

Fit the model according to the given training data.

Parameters

X ({array-like, sparse matrix}, shape = [n_samples, n_features]) – Training vector, where n_samples in the number of samples and n_features is the number of features.
y (array-like, shape = [n_samples]) – Target vector relative to X
sample_weight (array-like, shape = [n_samples], optional) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

Returns

self

Return type

object

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

score(X, y, sample_weight=None)

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

X (array-like of shape (n_samples, n_features)) – Test samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True labels for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns

score – Mean accuracy of self.predict(X) wrt. y.

Return type

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

class dwd.gen_dwd.GenDWD(lambd=1.0, q=1, implicit_P=True)

Generalized Distance Weighted Discrimination

Solves the gDWD problem using the MM algorithm derived in Wang and Zou, 2017.

Primary reference: Another look at distance-weighted discrimination by Boxiang Wang and Hui Zou, 2017

Note the tuning parameter lambd is on a different scale the parameter C which is used in the SOCP formulation.

Parameters

lambd (float) – Tuning parameter for DWD.
q (float) – Tuning parameter for generalized DWD (the exponent on the margin terms). When q = 1, gDWD is equivalent to DWD.
implicit_P (bool) – Whether to use the implicit P^{-1} gamma formulation (in the publication) or the explicit computation (in the arxiv version).

class dwd.gen_kern_dwd.KernGDWD(lambd=1.0, q=1.0, kernel='linear', kernel_kws={}, implicit_P=True)

Kernel Generalized Distance Weighted Discrimination

Solves the kernel gDWD problem using the MM algorithm derived in Wang and Zou, 2017.

Primary reference: Another look at distance-weighted discrimination by Boxiang Wang and Hui Zou, 2017

Note the tuning parameter lambd is on a different scale the parameter C which is used in the SOCP formulation.

Parameters

lambd (float) – Tuning parameter for DWD.
q (float) – Tuning parameter for generalized DWD (the exponent on the margin terms). When q = 1, gDWD is equivalent to DWD.
kernel (str, callable(X, Y, **kwargs)) – The kernel to use.
kernel_kws (dict) – Any key word arguments for the kernel.
implicit_P (bool) – Whether to use the implicit P^{-1} gamma formulation (in the publication) or the explicit computation (in the arxiv version).

Cross-validation

class dwd.gen_dwd.GenDWDCV(lambd_vals=array([0.01, 0.027825594, 0.0774263683, 0.215443469, 0.59948425, 1.66810054, 4.64158883, 12.9154967, 35.9381366, 100.0]), q_vals=array([0.01, 0.1, 1.0, 10.0, 100.0]), cv=5, scoring='accuracy')

Fits Genralized DWD with cross-validation. gDWD cross-validation can be significnatly faster if certain quantities are precomputed.

Parameters

lambd_vals (list of floats) – The lambda values to cross-validate over.
q_vals (list of floats) – The q-values to cross validate over.
cv – How to perform cross-valdiation. See documetnation in sklearn.model_selection.GridSearchCV.
scoring – What metric to use to score cross-validation. See documetnation in sklearn.model_selection.GridSearchCV.

class dwd.gen_kern_dwd.KernGDWDCV(lambd_vals=array([0.01, 0.027825594, 0.0774263683, 0.215443469, 0.59948425, 1.66810054, 4.64158883, 12.9154967, 35.9381366, 100.0]), q_vals=array([0.01, 0.1, 1.0, 10.0, 100.0]), kernel='linear', kernel_kws_vals={}, cv=5, scoring='accuracy')

Fits kernel gDWD with cross-validation. gDWD cross-validation can be significnatly faster if certain quantities are precomputed.

Parameters

lambd_vals (list of floats) – The lambda values to cross-validate over.
q_vals (list of floats) – The q-values to cross validate over.
kernel (str, callable) – The kernel to use.
kern_kws_vals (list of dicts) – The kernel parameters to validate over.
cv – How to perform cross-valdiation. See documetnation in sklearn.model_selection.GridSearchCV.
scoring – What metric to use to score cross-validation. See documetnation in sklearn.model_selection.GridSearchCV.