Treffer: Selector: A General Python Library for Diverse Subset Selection.

Title:
Selector: A General Python Library for Diverse Subset Selection.
Authors:
Meng F; Department of Chemistry, Queen's University, 90 Bader Lane, Kingston, Ontario K7L 3N6, Canada.; Department of Chemistry and Chemical Biology, McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4L8, Canada., Martínez González M; Department of Chemistry and Chemical Biology, McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4L8, Canada., Chuiko V; Department of Chemistry and Chemical Biology, McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4L8, Canada., Tehrani A; Department of Chemistry, Queen's University, 90 Bader Lane, Kingston, Ontario K7L 3N6, Canada., Al Nabulsi AR; Department of Chemistry and Chemical Biology, McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4L8, Canada., Broscius A; Department of Chemistry, Queen's University, 90 Bader Lane, Kingston, Ontario K7L 3N6, Canada., Khaleel H; Department of Chemistry and Chemical Biology, McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4L8, Canada., López-Pérez K; Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, Florida 32603, United States., Miranda-Quintana RA; Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, Florida 32603, United States., Ayers PW; Department of Chemistry and Chemical Biology, McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4L8, Canada., Heidar-Zadeh F; Department of Chemistry, Queen's University, 90 Bader Lane, Kingston, Ontario K7L 3N6, Canada.
Source:
Journal of chemical information and modeling [J Chem Inf Model] 2026 Jan 27. Date of Electronic Publication: 2026 Jan 27.
Publication Model:
Ahead of Print
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: American Chemical Society Country of Publication: United States NLM ID: 101230060 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1549-960X (Electronic) Linking ISSN: 15499596 NLM ISO Abbreviation: J Chem Inf Model Subsets: MEDLINE
Imprint Name(s):
Original Publication: Washington, D.C. : American Chemical Society, c2005-
Entry Date(s):
Date Created: 20260127 Latest Revision: 20260127
Update Code:
20260127
DOI:
10.1021/acs.jcim.5c01499
PMID:
41591801
Database:
MEDLINE

Weitere Informationen

Selector is a free, open-source Python library for selecting diverse subsets from any dataset, making it a versatile tool across a wide range of application domains. Selector implements different subset sampling algorithms based on sample distance, similarity, and spatial partitioning along with metrics to quantify subset diversity. It is flexible and integrates seamlessly with popular Python libraries such as Scikit-Learn, demonstrating the interoperability of the implemented algorithms with data analysis workflows. Selector is an operating-system-agnostic, accessible, and easily extensible package designed with modern software development practices, including version control, unit testing, and continuous integration. Interactive quick-start notebooks, which are also web-accessible, provide user-friendly tutorials for all skill levels, showcasing applications in computational chemistry, drug discovery, and chemical library design. Additionally, a web interface has been developed that allows users to easily upload datasets, configure sampling settings, and run subset selection algorithms with no programming required. This work serves as the official release note for the Selector package, offering a technical overview of its features, use cases, and development practices that ensure its quality and maintainability.