A Brief Introduction to Exploratory Data Analysis


  • Xuanzhe Song




EDA; dataset; Colab; visualization.


EDA is a process of tidying, processing and analyzing acquired data sets. Data science is an interdisciplinary field that combines scientific methods, systems, and processes from statistics, information science, and computer science to provide insight into phenomena through structured or unstructured data. This article will describe this process in detail in terms of purpose and method, including upload, tidy and visualization. In addition, the dataset of world Internet users was processed to conclude that the number of Internet users in Africa, the Americas and Europe accounted for more than 50% of the total population of each continent, Asia and Oceania for more than 30%, and finally the Middle East for 10 percent. In the example above, the basic profile of Internet users on each continent and the differences between each other can be seen very clearly and intuitively. This process of using data visualization to make it usable can be seen in all walks of life.