How to Create and Access Pandas DataFrame Effectively

1. ? Introduction: What Is a DataFrame in Pandas?

Pandas is a powerful library in Python designed for data manipulation and analysis. At its core is the DataFrame, a two-dimensional, tabular data structure similar to an Excel spreadsheet. It allows labeled rows and columns, and supports mixed data types, making it ideal for handling real-world datasets.

This tutorial explores three key methods to create a DataFrame and dives into two powerful indexers—`.loc[]` and `.iloc[]`—to access and manipulate data efficiently.

2. ? Three Ways to Create a DataFrame

2.1 Creating a DataFrame from a List of Lists

You can pass a list of lists (nested arrays) and define column headers to create a simple table:

import pandas as pd
data = [["Google", 25], ["Baidu", 30], ["Bing", 22]]
df = pd.DataFrame(data, columns=["Site", "Age"])
print(df)

This creates a DataFrame with automatic row indexing (0, 1, 2) and custom column names.

2.2 Creating a DataFrame from a Dictionary

This is one of the most intuitive and readable ways to build a DataFrame:

data = {
"Site": ["Google", "Baidu", "Bing"],
"Age": [25, 30, 22]
}
df = pd.DataFrame(data)
print(df)

Each dictionary key becomes a column, and the values are the data for that column.

2.3 Creating a DataFrame from a NumPy Array

Using NumPy arrays can be more memory-efficient and performant:

import numpy as np
import pandas as pd

arr = np.array([["Google", 25], ["Baidu", 30], ["Bing", 22]])
df = pd.DataFrame(arr, columns=["Site", "Age"])
print(df)

Ideal for scenarios where performance and numerical computations are key.

3. ? Accessing Data in a DataFrame

3.1 Using `loc[]` for Label-Based Indexing

The `loc[]` indexer is used to select rows (and columns) by their labels:

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data, index=["day1", "day2", "day3"])
print(df.loc["day2"])

To select multiple rows or specific columns:

print(df.loc[["day1", "day3"]]) # Multiple rows
print(df.loc[["day1", "day3"], ["calories"]]) # Rows + Columns

3.2 Using `iloc[]` for Position-Based Indexing

If you’re working with integer indexes, use `iloc[]`:

print(df.iloc[[0, 1], [0]])

This retrieves rows 0 and 1, and column 0.

4. ✅ Conclusion

Understanding how to create and access DataFrame objects is essential for any data analysis workflow in Python. The flexibility offered by different creation methods and access strategies enables you to handle various types of datasets smoothly.

Key takeaways:

  1. Use `loc[]` for label-based selections
  2. Use `iloc[]` for position-based selections
  3. Choose your creation method based on the data source (list, dict, array)
  4. Once mastered, these tools greatly accelerate your data manipulation process

5. Demo Video

You can watch the following demo video by select the subtitle to your preferred subtitle language.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top