Here are 10 Python code snippets demonstrating data manipulation and cleaning using the powerful Pandas library:
1. Loading Data from CSV
Loading data from a CSV file into a Pandas DataFrame.
import pandas as pd# Load data from a CSV filedf = pd.read_csv('data.csv')# Display first 5 rows of the dataframeprint(df.head())
This code loads data from a CSV file named data.csv and displays the first 5 rows.
2. Data Cleaning: Handling Missing Values
Filling missing values with a specified value.
import pandas as pd# Sample DataFrame with missing valuesdata ={'Name':['Alice','Bob','Charlie',None],'Age':[25,None,30,22]}df = pd.DataFrame(data)# Fill missing values with a default valuedf['Age']= df['Age'].fillna(df['Age'].mean())# Display cleaned DataFrameprint(df)
This example demonstrates how to fill missing values in the Age column with the mean of the column.
3. Dropping Rows with Missing Values
Removing rows that have missing values.
This snippet removes any rows with missing data.
4. Converting Data Types
Changing the data type of a column.
This code converts the Price column from strings to floating-point numbers.
5. Renaming Columns
Renaming columns in a DataFrame.
This snippet demonstrates how to rename columns in the DataFrame.
6. Filtering Data Based on Conditions
Filtering rows based on a condition.
This example filters the DataFrame to show only rows where the Age column is greater than 30.
7. Sorting Data
Sorting data by one or more columns.
This snippet sorts the DataFrame by the Age column in ascending order.
8. Grouping Data
Grouping data and performing aggregation operations.
This example groups data by the Category column and calculates the sum of the Value column for each group.
9. Applying Functions to Columns
Applying a custom function to a DataFrame column.
This snippet demonstrates how to apply a lambda function to a column (Age) to compute a new column (Age in 10 years).
10. Concatenating DataFrames
Concatenating multiple DataFrames along rows or columns.
This code concatenates two DataFrames df1 and df2 along the rows, combining them into a single DataFrame.
These snippets cover various common data manipulation tasks in Pandas, such as loading, cleaning, filtering, grouping, and transforming data. You can combine these techniques to perform more complex data analysis and transformation tasks.
import pandas as pd
# Sample DataFrame with missing values
data = {'Name': ['Alice', 'Bob', 'Charlie', None], 'Age': [25, None, 30, 22]}
df = pd.DataFrame(data)
# Drop rows with any missing values
df_cleaned = df.dropna()
# Display the cleaned DataFrame
print(df_cleaned)
import pandas as pd
# Sample DataFrame
data = {'ID': ['001', '002', '003'], 'Price': ['10.5', '12.8', '9.3']}
df = pd.DataFrame(data)
# Convert the 'Price' column to float
df['Price'] = df['Price'].astype(float)
# Display the DataFrame with updated types
print(df)
import pandas as pd
# Sample DataFrame
data = {'name': ['Alice', 'Bob'], 'age': [25, 30]}
df = pd.DataFrame(data)
# Rename columns
df = df.rename(columns={'name': 'Name', 'age': 'Age'})
# Display the DataFrame with renamed columns
print(df)
import pandas as pd
# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 35, 40]}
df = pd.DataFrame(data)
# Filter rows where Age is greater than 30
filtered_df = df[df['Age'] > 30]
# Display the filtered DataFrame
print(filtered_df)
import pandas as pd
# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 22]}
df = pd.DataFrame(data)
# Sort by 'Age' column in ascending order
sorted_df = df.sort_values(by='Age')
# Display the sorted DataFrame
print(sorted_df)
import pandas as pd
# Sample DataFrame
data = {'Category': ['A', 'A', 'B', 'B'], 'Value': [10, 20, 30, 40]}
df = pd.DataFrame(data)
# Group by 'Category' and calculate the sum of 'Value'
grouped_df = df.groupby('Category').sum()
# Display the grouped DataFrame
print(grouped_df)
import pandas as pd
# Sample DataFrame
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
# Apply a lambda function to the 'Age' column
df['Age in 10 years'] = df['Age'].apply(lambda x: x + 10)
# Display the updated DataFrame
print(df)