1. ? Introduction: What Is a DataFrame in Pandas?
Pandas is a powerful library in Python designed for data manipulation and analysis. At its core is the DataFrame, a two-dimensional, tabular data structure similar to an Excel spreadsheet. It allows labeled rows and columns, and supports mixed data types, making it ideal for handling real-world datasets.
This tutorial explores three key methods to create a DataFrame and dives into two powerful indexers—`.loc[]` and `.iloc[]`—to access and manipulate data efficiently.
2. ? Three Ways to Create a DataFrame
2.1 Creating a DataFrame from a List of Lists
You can pass a list of lists (nested arrays) and define column headers to create a simple table:
import pandas as pd data = [["Google", 25], ["Baidu", 30], ["Bing", 22]] df = pd.DataFrame(data, columns=["Site", "Age"]) print(df)
This creates a DataFrame with automatic row indexing (0, 1, 2) and custom column names.
2.2 Creating a DataFrame from a Dictionary
This is one of the most intuitive and readable ways to build a DataFrame:
data = {
"Site": ["Google", "Baidu", "Bing"],
"Age": [25, 30, 22]
}
df = pd.DataFrame(data)
print(df)
Each dictionary key becomes a column, and the values are the data for that column.
2.3 Creating a DataFrame from a NumPy Array
Using NumPy arrays can be more memory-efficient and performant:
import numpy as np import pandas as pd arr = np.array([["Google", 25], ["Baidu", 30], ["Bing", 22]]) df = pd.DataFrame(arr, columns=["Site", "Age"]) print(df)
Ideal for scenarios where performance and numerical computations are key.
3. ? Accessing Data in a DataFrame
3.1 Using `loc[]` for Label-Based Indexing
The `loc[]` indexer is used to select rows (and columns) by their labels:
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data, index=["day1", "day2", "day3"])
print(df.loc["day2"])
To select multiple rows or specific columns:
print(df.loc[["day1", "day3"]]) # Multiple rows print(df.loc[["day1", "day3"], ["calories"]]) # Rows + Columns
3.2 Using `iloc[]` for Position-Based Indexing
If you’re working with integer indexes, use `iloc[]`:
print(df.iloc[[0, 1], [0]])
This retrieves rows 0 and 1, and column 0.
4. ✅ Conclusion
Understanding how to create and access DataFrame objects is essential for any data analysis workflow in Python. The flexibility offered by different creation methods and access strategies enables you to handle various types of datasets smoothly.
Key takeaways:
- Use `loc[]` for label-based selections
- Use `iloc[]` for position-based selections
- Choose your creation method based on the data source (list, dict, array)
- Once mastered, these tools greatly accelerate your data manipulation process
5. Demo Video
You can watch the following demo video by select the subtitle to your preferred subtitle language.