Pandas is a powerful and versatile Python library that is designed for data manipulation and analysis. It provides high-performance data structures and data analysis tools that are essential for anyone working with data in Python.
The core data structure in Pandas is the DataFrame, which is a two-dimensional table-like object that can store data of various types, including numbers, strings, and dates. DataFrames can be created from various sources, such as CSV files, Excel spreadsheets, and databases.
Pandas offers a wide range of functions for data manipulation, including:
In addition to data manipulation, Pandas also provides functions for data analysis, such as:
To start using Pandas, you first need to install it. You can do this using the following command in your terminal:
pip install pandas
Once you have Pandas installed, you can import it into your Python script using the following code:
import pandas as pd
The pd
alias is a common convention used to refer to Pandas in Python code.
You can create a DataFrame from a list of dictionaries:
data = [{'Name': 'John', 'Age': 30, 'City': 'New York'},
{'Name': 'Jane', 'Age': 25, 'City': 'London'},
{'Name': 'Peter', 'Age': 35, 'City': 'Paris'}]
df = pd.DataFrame(data)
print(df)
Output:
Name Age City
0 John 30 New York
1 Jane 25 London
2 Peter 35 Paris
You can access data in a DataFrame using column names and row indices:
print(df['Name']) # Accessing a column
print(df.iloc[0]) # Accessing the first row
print(df.loc[0, 'Name']) # Accessing a specific cell
You can filter data in a DataFrame using boolean indexing:
filtered_df = df[df['Age'] > 30]
print(filtered_df)
You can sort data in a DataFrame by one or more columns:
sorted_df = df.sort_values(by='Age', ascending=False)
print(sorted_df)
You can aggregate data in a DataFrame using built-in functions:
print(df.groupby('City')['Age'].mean())
This code calculates the average age for each city in the DataFrame.