Taylor & Francis Group
Browse

A New Projection Pursuit Index for Big Data

dataset
posted on 2025-02-21, 20:01 authored by Yajie Duan, Javier Cabrera, Birol Emir

Visualization of extremely large datasets, whether in static or dynamic form, poses a significant challenge due to the limitations of most traditional methods in handling big-data problems. To address this challenge, a novel visualization approach for big data is proposed based on Projection Pursuit, Grand and Guided Tours, and Data Nuggets methods. The aim of this new methodology is to discover hidden structures such as clusters, outliers, and other nonlinear structures within large datasets. The Guided Tour, a dynamic graphical tool for high-dimensional data, integrates Projection Pursuit and Grand Tour techniques to present a dynamic sequence of low-dimensional projections obtained by Projection Pursuit index functions to navigate the data space. While various Projection Pursuit indices have been developed in the past, computational constraints arise when applying the original Guided Tour approach to big-data scenarios. A new PP index is developed to be computable for big data, with the help of a data compression method called “Data Nuggets” that reduces large datasets while maintaining the original data structure. The effectiveness of the proposed methodology is demonstrated on simulated datasets. A big data application is presented to illustrate the new method in the real world. The development of static and dynamic graphical tools based on the proposed Projection Pursuit index holds promise for detecting nonlinear structures within big-data contexts, offering valuable insights for big-data analysis.

History