knnpredict
Classify new data into categories using the kNN algorithm.
label = knnpredict (X, Y, XC)
returns the matrix of labels predicted for the corresponding instances
in XC
, using the predictor data in X
and corresponding
categorical data in Y
. X is used to train the kNN model and
values in XC are classified into classes in Y.
X
must be a numeric matrix of input data where rows
correspond to observations and columns correspond to features or variables.
X will be used to train the kNN model.
Y
is matrix or cell matrix containing the class labels of
corresponding predictor data in X. Y can contain any type of
categorical data. Y must have same numbers of Rows as X.
XC
must be a numeric matrix of query/new points that
are to be classified into the labels.
XC must have same numbers of columns as X.
[label, score, cost] = knnpredict (…)
also
returns score, which contains the predicted class scores or posterior
probabilities for each instance for corresponding unique classes in Y,
and cost, which is a matrix containing expected cost of the
classifications. Each row in cost contains the expected cost of
classification of observations in XC into each class of unique
classes in Y.
label = knnpredict (…, Name, Value)
returns a
matrix label containing the predicted labels with additional parameters
specified by Name-Value
pair arguments listed below.
Name | Value | |
---|---|---|
"K" | is the number of nearest neighbors to be found in the kNN search. It must be a positive integer value and by default it is 1. | |
"weights" | is a numeric non-negative matrix
of the observational weights, each row in weights corresponds
to the row in Y and indicates the relative importance or
weight to be considered in calculating the Nearest-neighbour,
negative values are removed before calculations if weights are
specified. default value weight = ones(rows(Y),1) . | |
"P" | is the Minkowski distance exponent and it must be
a positive scalar. This argument is only valid when the selected
distance metric is "minkowski" . By default it is 2. | |
"scale" | is the scale parameter for the standardized
Euclidean distance and it must be a nonnegative numeric vector of
equal length to the number of columns in X. This argument is
only valid when the selected distance metric is "seuclidean"
, in which case each coordinate of X is scaled by the
corresponding element of "scale" , as is each query point in
Y. By default, the scale parameter is the standard deviation
of each coordinate in X. | |
"cov" | is the covariance matrix for computing the
mahalanobis distance and it must be a positive definite matrix
matching the the number of columns in X. This argument is
only valid when the selected distance metric is
"mahalanobis" . | |
"cost" | is a numeric matrix containing
misclassification cost for the corresponding instances in X
where R is the number of unique categories in Y.
If an instance is correctly classified into its category the
cost is calculated to be 1, If not then 0. default value
cost = ones(rows(X),numel(unique(Y))) . | |
"BucketSize" | is the maximum number of data points in
the leaf node of the Kd-tree and it must be a positive integer.
This argument is only valid when the selected search
method is "kdtree" . | |
"Distance" | is the distance metric used by
knnsearch as specified below: |
"euclidean" | Euclidean distance. | |
"seuclidean" | standardized Euclidean distance. Each
coordinate difference between the rows in X and the query matrix
Y is scaled by dividing by the corresponding element of the standard
deviation computed from X. To specify a different scaling, use the
"scale" name-value argument. | |
"cityblock" | City block distance. | |
"chebychev" | Chebychev distance (maximum coordinate difference). | |
"minkowski" | Minkowski distance. The default exponent
is 2. To specify a different exponent, use the "P" name-value
argument. | |
"mahalanobis" | Mahalanobis distance, computed using a
positive definite covariance matrix. To change the value of the covariance
matrix, use the "cov" name-value argument. | |
"cosine" | Cosine distance. | |
"correlation" | One minus the sample linear correlation between observations (treated as sequences of values). | |
"spearman" | One minus the sample Spearman’s rank correlation between observations (treated as sequences of values). | |
"hamming" | Hamming distance, which is the percentage of coordinates that differ. | |
"jaccard" | One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ. |
"NSMethod" | is the nearest neighbor search method used
by knnsearch as specified below. |
"kdtree" | Creates and uses a Kd-tree to find nearest
neighbors. "kdtree" is the default value when the number of columns
in X is less than or equal to 10, X is not sparse, and the
distance metric is "euclidean" , "cityblock" ,
"chebychev" , or "minkowski" . Otherwise, the default value is
"exhaustive" . This argument is only valid when the distance metric
is one of the four aforementioned metrics. | |
"exhaustive" | Uses the exhaustive search algorithm by computing the distance values from all the points in X to each point in Y. |
"standardize" | is the flag to indicate if kNN should be calculated after standardizing X. |
Source Code: knnpredict