Introduction to Pandas for Data Analysis

Page 1: What is Pandas?

Pandas is a powerful and versatile Python library that is designed for data manipulation and analysis. It provides high-performance data structures and data analysis tools that are essential for anyone working with data in Python.

The core data structure in Pandas is the DataFrame, which is a two-dimensional table-like object that can store data of various types, including numbers, strings, and dates. DataFrames can be created from various sources, such as CSV files, Excel spreadsheets, and databases.

Pandas offers a wide range of functions for data manipulation, including:

Filtering and selecting data
Sorting and indexing data
Adding and removing rows and columns
Transforming and aggregating data

In addition to data manipulation, Pandas also provides functions for data analysis, such as:

Descriptive statistics
Data visualization
Time series analysis
Data cleaning and preparation

Page 2: Getting Started with Pandas

To start using Pandas, you first need to install it. You can do this using the following command in your terminal:

pip install pandas

Once you have Pandas installed, you can import it into your Python script using the following code:

import pandas as pd

The pd alias is a common convention used to refer to Pandas in Python code.

Creating a DataFrame

You can create a DataFrame from a list of dictionaries:

        data = [{'Name': 'John', 'Age': 30, 'City': 'New York'},
                  {'Name': 'Jane', 'Age': 25, 'City': 'London'},
                  {'Name': 'Peter', 'Age': 35, 'City': 'Paris'}]

        df = pd.DataFrame(data)
        print(df)
    

Output:

              Name  Age      City
        0     John   30  New York
        1     Jane   25    London
        2    Peter   35     Paris
    

Accessing Data

You can access data in a DataFrame using column names and row indices:

        print(df['Name'])  # Accessing a column
        print(df.iloc[0])  # Accessing the first row
        print(df.loc[0, 'Name'])  # Accessing a specific cell
    

Page 3: Data Manipulation with Pandas

Filtering Data

You can filter data in a DataFrame using boolean indexing:

        filtered_df = df[df['Age'] > 30]
        print(filtered_df)
    

Sorting Data

You can sort data in a DataFrame by one or more columns:

        sorted_df = df.sort_values(by='Age', ascending=False)
        print(sorted_df)
    

Aggregating Data

You can aggregate data in a DataFrame using built-in functions:

print(df.groupby('City')['Age'].mean())

This code calculates the average age for each city in the DataFrame.

Page 1 Page 2 Page 3

Back to Blogs