148. Using pandas for Data Analysis

The pandas library is one of the most powerful and popular tools for data analysis and manipulation in Python. It provides data structures like DataFrame and Series for handling structured data, such as tables in a database or spreadsheet.

Here are 10 Python snippets demonstrating common data analysis tasks using pandas:

1. Creating a DataFrame

Creating a DataFrame from a dictionary of lists.

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [24, 27, 22, 32],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Miami']
}

df = pd.DataFrame(data)
print(df)

Explanation:

  • A DataFrame is created from a dictionary, where the keys are the column names and the values are lists of data.

2. Reading Data from a CSV File

Reading a CSV file into a DataFrame.

import pandas as pd

df = pd.read_csv('data.csv')  # Replace 'data.csv' with your file path
print(df.head())  # Display the first 5 rows of the DataFrame

Explanation:

  • pd.read_csv() loads data from a CSV file into a DataFrame.

3. DataFrame Selection and Indexing

Selecting a single column or multiple columns from a DataFrame.

Explanation:

  • Use df['column_name'] for selecting a single column and df[['col1', 'col2']] for selecting multiple columns.

4. Filtering Data

Filtering data based on conditions.

Explanation:

  • You can filter a DataFrame by applying a condition on columns like df[df['Age'] > 23].

5. Handling Missing Data

Handling missing or NaN values in a DataFrame.

Explanation:

  • df.fillna(value) replaces NaN values with the specified value.

6. Grouping Data

Grouping data by one or more columns and performing aggregation.

Explanation:

  • df.groupby('City') groups the data by the 'City' column and allows performing aggregation functions like mean().

7. Sorting Data

Sorting a DataFrame by one or more columns.

Explanation:

  • df.sort_values('column_name') sorts the DataFrame by the specified column. Use ascending=False for descending order.

8. Applying Functions to Columns

Applying a custom function to each element of a column.

Explanation:

  • df['Age'].apply(func) applies a custom function to each element in the 'Age' column.

9. Merging DataFrames

Merging two DataFrames on a common column.

Explanation:

  • pd.merge(df1, df2, on='column_name') merges two DataFrames based on a common column. The how parameter defines the type of join: inner, outer, left, or right.

10. Pivot Table

Creating a pivot table to summarize data.

Explanation:

  • pd.pivot_table(df, values='column_name', index='group_column') creates a pivot table that summarizes the data, allowing for aggregation functions like mean, sum, count, etc.


Conclusion:

pandas provides a comprehensive set of tools to handle and analyze structured data. Whether you're performing basic data manipulation, cleaning, aggregation, or advanced data analysis, pandas simplifies the task, allowing you to focus on the logic of your analysis rather than the implementation details.

Last updated