Notes in Week 6 - Classification

To Subscribe, use this Key


Status Last Update Fields
Published 11/26/2024 In supervised learning, the training set contains both input features {{c1::x}} and target labels {{c2::y}}.
Published 11/26/2024 Clustering involves grouping objects such that similar objects are in the same {{c1::group}}, while dissimilar ones are in {{c2::different groups}}.
Published 11/26/2024 Clustering can assist with {{c1::anomaly detection}} by learning normal data patterns and identifying deviations as potential issues.
Published 11/26/2024 In {{c1::partitional clustering}}, data objects are divided into non-overlapping subsets, with each object belonging to only {{c2::one subset}}.
Published 11/26/2024 In {{c1::hierarchical clustering}}, data objects are organized in a tree structure, allowing points to be grouped into clusters at multiple {{c2::leve…
Published 11/26/2024 The K-Means algorithm requires specifying the number of clusters {{c1::K}} before running.
Published 11/26/2024 Each cluster in K-Means is associated with a {{c1::centroid}}, and each point is assigned to the cluster with the closest {{c2::centroid}}.
Published 11/26/2024 The K-Means cost function minimizes the sum of squared distances between each point and its {{c1::assigned centroid}}.
Published 11/26/2024 K-Means does not guarantee finding the {{c1::global minimum}} of the objective function due to its sensitivity to initial centroid placement.
Published 11/26/2024 Random initialization in K-Means involves randomly picking {{c1::K}} training examples to set as initial centroids.
Published 11/26/2024 A common K-Means strategy is to run the algorithm multiple times with different initializations and pick the solution with the {{c1::lowest cost}}.
Published 11/26/2024 The optimal number of clusters in K-Means is often determined by examining when the cost function {{c1::decreases slowly}}, known as the {{c2::elbow m…
Published 11/26/2024 K-Means struggles with clusters of varying sizes, densities, or non-{{c1::spherical shapes}}.
Published 11/26/2024 Hierarchical clustering results can be visualized with a {{c1::dendrogram}}, which shows how points are grouped into clusters at each level.
Published 11/26/2024 Hierarchical clustering does not require specifying the {{c1::number of clusters}} in advance.
Published 11/26/2024 In {{c1::agglomerative}} hierarchical clustering, points start as individual clusters and merge until only one or a specified number of clusters remai…
Published 11/26/2024 In {{c1::divisive}} hierarchical clustering, all points start in one cluster and are repeatedly split until reaching individual clusters or a set numb…
Published 11/26/2024 Agglomerative clustering requires a {{c1::distance metric}} to measure the similarity between points or clusters.
Published 11/26/2024 Single linkage (MIN) in hierarchical clustering connects clusters based on the {{c1::shortest distance}} between points in each cluster.
Published 11/26/2024 Complete linkage (MAX) in hierarchical clustering connects clusters based on the {{c1::largest distance}} between points in each cluster.
Published 11/26/2024 Group average (average linkage) in hierarchical clustering is a {{c1::compromise}} between MIN and MAX linkage.
Published 11/26/2024 DBSCAN is a density-based clustering method where clusters are defined by {{c1::dense regions}} separated by sparse regions.
Published 11/26/2024 In DBSCAN, a point is a {{c1::core point}} if it has more than a specified number of neighbors (MinPts) within a radius (Eps).
Published 11/26/2024 In DBSCAN, a {{c1::border point}} has fewer than MinPts neighbors within Eps but is within the neighborhood of a core point.
Published 11/26/2024 A {{c1::noise point}} in DBSCAN is a point that is neither a core point nor a border point.
Published 11/26/2024 Compared to K-Means, DBSCAN can handle clusters of varying {{c1::densities}} and {{c2::non-spherical shapes}}.
Published 11/26/2024 Cluster cohesion measures how {{c1::closely related}} objects within a cluster are.
Published 11/26/2024 Cluster separation measures how {{c1::distinct}} or well-separated a cluster is from other clusters.
Published 11/26/2024 Cluster cohesion is often calculated as the {{c1::within-cluster sum of squares (WSS)}}, also known as SSE.
Published 11/26/2024 Cluster separation is often calculated as the {{c1::between-cluster sum of squares (BSS)}}.
Published 11/26/2024 In cluster analysis, the {{c1::similarity matrix}} can be visualized to assess the organization of clusters.
Published 11/26/2024 Ordering the similarity matrix by cluster labels can provide hints about {{c1::cluster validity}}.
Published 11/26/2024 A key difference between supervised and unsupervised learning is that supervised learning requires {{c1::labeled data}}.
Published 11/26/2024 The K-Means algorithm repeatedly assigns points to the nearest {{c1::centroid}} and then updates centroids based on cluster points.
Published 11/26/2024 In hierarchical clustering, cutting the dendrogram at different levels provides {{c1::different numbers of clusters}}.
Published 11/26/2024 A benefit of hierarchical clustering over K-Means is that it can reveal {{c1::nested cluster structures}}.
Published 11/26/2024 The {{c1::choice of distance metric}} (e.g., Euclidean, Manhattan) affects the shape and structure of clusters in clustering algorithms.
Published 11/26/2024 In DBSCAN, the Eps parameter determines the {{c1::radius}} for defining dense regions, influencing the number and shape of clusters.
Status Last Update Fields