Notes in Week 6 - Classification

To Subscribe, use this Key

Status	Last Update	Fields
Published	11/26/2024	In supervised learning, the training set contains both input features {{c1::x}} and target labels {{c2::y}}.
Published	11/26/2024	Clustering involves grouping objects such that similar objects are in the same {{c1::group}}, while dissimilar ones are in {{c2::different groups}}.
Published	11/26/2024	Clustering can assist with {{c1::anomaly detection}} by learning normal data patterns and identifying deviations as potential issues.
Published	11/26/2024	In {{c1::partitional clustering}}, data objects are divided into non-overlapping subsets, with each object belonging to only {{c2::one subset}}.
Published	11/26/2024	In {{c1::hierarchical clustering}}, data objects are organized in a tree structure, allowing points to be grouped into clusters at multiple {{c2::leve…
Published	11/26/2024	The K-Means algorithm requires specifying the number of clusters {{c1::K}} before running.
Published	11/26/2024	Each cluster in K-Means is associated with a {{c1::centroid}}, and each point is assigned to the cluster with the closest {{c2::centroid}}.
Published	11/26/2024	The K-Means cost function minimizes the sum of squared distances between each point and its {{c1::assigned centroid}}.
Published	11/26/2024	K-Means does not guarantee finding the {{c1::global minimum}} of the objective function due to its sensitivity to initial centroid placement.
Published	11/26/2024	Random initialization in K-Means involves randomly picking {{c1::K}} training examples to set as initial centroids.
Published	11/26/2024	A common K-Means strategy is to run the algorithm multiple times with different initializations and pick the solution with the {{c1::lowest cost}}.
Published	11/26/2024	The optimal number of clusters in K-Means is often determined by examining when the cost function {{c1::decreases slowly}}, known as the {{c2::elbow m…
Published	11/26/2024	K-Means struggles with clusters of varying sizes, densities, or non-{{c1::spherical shapes}}.
Published	11/26/2024	Hierarchical clustering results can be visualized with a {{c1::dendrogram}}, which shows how points are grouped into clusters at each level.
Published	11/26/2024	Hierarchical clustering does not require specifying the {{c1::number of clusters}} in advance.
Published	11/26/2024	In {{c1::agglomerative}} hierarchical clustering, points start as individual clusters and merge until only one or a specified number of clusters remai…
Published	11/26/2024	In {{c1::divisive}} hierarchical clustering, all points start in one cluster and are repeatedly split until reaching individual clusters or a set numb…
Published	11/26/2024	Agglomerative clustering requires a {{c1::distance metric}} to measure the similarity between points or clusters.
Published	11/26/2024	Single linkage (MIN) in hierarchical clustering connects clusters based on the {{c1::shortest distance}} between points in each cluster.
Published	11/26/2024	Complete linkage (MAX) in hierarchical clustering connects clusters based on the {{c1::largest distance}} between points in each cluster.
Published	11/26/2024	Group average (average linkage) in hierarchical clustering is a {{c1::compromise}} between MIN and MAX linkage.
Published	11/26/2024	DBSCAN is a density-based clustering method where clusters are defined by {{c1::dense regions}} separated by sparse regions.
Published	11/26/2024	In DBSCAN, a point is a {{c1::core point}} if it has more than a specified number of neighbors (MinPts) within a radius (Eps).
Published	11/26/2024	In DBSCAN, a {{c1::border point}} has fewer than MinPts neighbors within Eps but is within the neighborhood of a core point.
Published	11/26/2024	A {{c1::noise point}} in DBSCAN is a point that is neither a core point nor a border point.
Published	11/26/2024	Compared to K-Means, DBSCAN can handle clusters of varying {{c1::densities}} and {{c2::non-spherical shapes}}.
Published	11/26/2024	Cluster cohesion measures how {{c1::closely related}} objects within a cluster are.
Published	11/26/2024	Cluster separation measures how {{c1::distinct}} or well-separated a cluster is from other clusters.
Published	11/26/2024	Cluster cohesion is often calculated as the {{c1::within-cluster sum of squares (WSS)}}, also known as SSE.
Published	11/26/2024	Cluster separation is often calculated as the {{c1::between-cluster sum of squares (BSS)}}.
Published	11/26/2024	In cluster analysis, the {{c1::similarity matrix}} can be visualized to assess the organization of clusters.
Published	11/26/2024	Ordering the similarity matrix by cluster labels can provide hints about {{c1::cluster validity}}.
Published	11/26/2024	A key difference between supervised and unsupervised learning is that supervised learning requires {{c1::labeled data}}.
Published	11/26/2024	The K-Means algorithm repeatedly assigns points to the nearest {{c1::centroid}} and then updates centroids based on cluster points.
Published	11/26/2024	In hierarchical clustering, cutting the dendrogram at different levels provides {{c1::different numbers of clusters}}.
Published	11/26/2024	A benefit of hierarchical clustering over K-Means is that it can reveal {{c1::nested cluster structures}}.
Published	11/26/2024	The {{c1::choice of distance metric}} (e.g., Euclidean, Manhattan) affects the shape and structure of clusters in clustering algorithms.
Published	11/26/2024	In DBSCAN, the Eps parameter determines the {{c1::radius}} for defining dense regions, influencing the number and shape of clusters.
Status	Last Update	Fields