Monday, August 13, 2018

Python Libraries for Data Science

Many popular Python toolboxes/libraries:
•NumPy
•SciPy
•Pandas
•SciKit-Learn
Visualization libraries
•matplotlib
•Seaborn
and many more …

NumPy:

  • introduces objects for multidimensional arrays and matrices, as well as functions that allow to easily perform advanced mathematical and statistical operations on those objects
  • provides vectorization of mathematical operations on arrays and matrices which significantly improves the performance
  • many other python libraries are built on NumPy



SciPy:

  • collection of algorithms for linear algebra, differential equations, numerical integration, optimization, statistics and more
  • part of SciPyStack
  • built on NumPy



Pandas:

  • adds data structures and tools designed to work with table-like data (similar to Series and Data Frames in R)
  • provides tools for data manipulation: reshaping, merging, sorting, slicing, aggregation etc.
  • allows handling missing data


SciKit-Learn:

  • provides machine learning algorithms: classification, regression, clustering, model validation etc.
  • built on NumPy, SciPyand matplotlib


Matplotlib:

  • python 2D plotting library which produces publication quality figures in a variety of hardcopy formats
  • aset of functionalities similar to those of MATLAB
  • line plots, scatter plots, barcharts, histograms, pie charts etc.
  • relatively low-level; some effort needed to create advanced visualization



Seaborn:

  • Based on matplotlib
  • Provides high level interface for drawing attractive statistical graphics
  • Similar (in style) to the popular ggplot2 library in R
_________________________________________________________________________
Download tutorial Notebook:

No comments:

Post a Comment