How to parse JSON with Python

Learn to parse JSON strings with Python's built-in json module and convert JSON files using pandas.

Content

What is JSON?

JSON (JavaScript Object Notation) is a lightweight data-interchange format that is easy for humans to read and write while also being easy for machines to parse and generate. It is widely used for transmitting data between a client and a server as an alternative to XML.

JSON data is represented as a collection of key-value pairs, where the keys are strings and the values can be any valid JSON data type, such as a string, number, boolean, null, array, or object.

{
    "name": "John Doe",
    "age": 30,
    "city": "New York"
}

In this example, name, age, and city are the keys, and "John Doe", 30, and "New York" are the corresponding values.

How to parse JSON strings in Python

To parse a JSON string in Python, you can use the built-in json module. This module provides two methods for working with JSON data:

  • json.loads() parses a JSON string and returns a Python object.
  • json.dumps() takes a Python object and returns a JSON string.

Here's an example of how to use json.loads() to parse a JSON string:

import json

# JSON string
json_str = '{"name": "John", "age": 30, "city": "New York"}'

# parse JSON string
data = json.loads(json_str)

# print Python object
print(data)

In this example, we import the json module, define a JSON string, and use json.loads() to parse it into a Python object. We then print the resulting Python object.

Note that json.loads() will raise a json.decoder.JSONDecodeError exception if the input string is not valid JSON.

After running the script above we can expect to get the following output printed to the console:

{'name': 'John', 'age': 30, 'city': 'New York'}

How to read and parse JSON files in Python

To parse a JSON file in Python, you can use the same json module we used in the previous section. The only difference is that instead of passing a JSON string to json.loads(), we'll pass the contents of a JSON file.

For example, assume we have a file named **data.json** that we would like to parse and read. Here's how we would do it:

import json

# open JSON file
with open('data.json', 'r') as f:
    # parse JSON data
    data = json.load(f)

# print Python object
print(data)

In this example, we use the open() function to open a JSON target file called data.json in read mode. We then pass the file object to json.load(), which parses the JSON data and returns a Python object. We then print the resulting Python object.

Note that if the JSON file is not valid JSON, json.load() will raise a json.decoder.JSONDecodeError exception.

How to pretty print JSON data in Python

When working with JSON data in Python, it can often be helpful to pretty print the data, which means to format it in a more human-readable way. The json module provides a method called json.dumps() that can be used to pretty print JSON data.

Here's an example of how to pretty print JSON data in Python:

import json

# define JSON data
data = {
    "name": "John",
    "age": 30,
    "city": "New York",
    "hobbies": ["reading", "traveling", "cooking"]
}

# pretty print JSON data
pretty_json = json.dumps(data, indent=4)

# print pretty JSON
print(pretty_json)

Output:

{
    "name": "John",
    "age": 30,
    "city": "New York",
    "hobbies": [
        "reading",
        "traveling",
        "cooking"
    ]
}

In this example, we define a Python dictionary representing JSON data, and then use json.dumps() with the indent argument set to 4 to pretty print the data. We then print the resulting pretty printed JSON string.

Note that indent is an optional argument to json.dumps() that specifies the number of spaces to use for indentation. If indent is not specified, the JSON data will be printed without any indentation.

How to parse JSON with pandas

In addition to the built-in json package, you can also use pandas to parse and work with JSON data in Python. pandas provides a method called pandas.read_json() that can read JSON data into a DataFrame.

Compared to using the built-in json package, working with pandas can be easier and more convenient when we want to analyze and manipulate the data further, as it allows us to use the powerful and flexible DataFrame object.

Here is an example of how to parse JSON data with pandas:

import pandas as pd
import json

# define JSON data
data = {
    "name": ["John", "Jane", "Bob"],
    "age": [30, 25, 35],
    "city": ["New York", "London", "Paris"]
}

# convert JSON to DataFrame using pandas
df = pd.read_json(json.dumps(data))

# print DataFrame
print(df)

Output:


   name  age       city
0  John   30   New York
1  Jane   25     London
2   Bob   35      Paris

In this example, we define a Python dictionary representing JSON data, and use json.dumps() to convert it to a JSON string. We then use pandas.read_json() to read the JSON string into a DataFrame. Finally, we print the resulting DataFrame.

One benefit of using pandas to parse JSON data is that we can easily manipulate the resulting DataFrame, for example by selecting columns, filtering rows, or grouping data.

import pandas as pd
import json

# define JSON data
data = {
    "name": ["John", "Jane", "Bob"],
    "age": [30, 25, 35],
    "city": ["New York", "London", "Paris"]
}

# convert JSON to DataFrame using pandas
df = pd.read_json(json.dumps(data))

# select columns
df = df[["name", "age"]]

# filter rows
df = df[df["age"] > 30]

# print resulting DataFrame
print(df)

Output:

  name  age
2  Bob   35

In this example, we select only the name and age columns from the DataFrame, and filter out any rows where the age is less than or equal to 30.

Using pandas to parse and work with JSON data in Python can be a convenient and powerful alternative to using the built-in json package. It allows us to easily manipulate and analyze the data using the DataFrame object, which offers a rich set of functionality for working with tabular data.

How to convert JSON to CSV in Python

Sometimes you might want to convert JSON data into a CSV format. Luckily, the pandas library can also help with that.

You can use the pandas.read_json() to read JSON data into a DataFrame, followed by a method called DataFrame.to_csv() to write the DataFrame to a CSV file.

Here is an example of how to convert JSON data to CSV in Python using pandas:

import pandas as pd

# define JSON data
data = {
    "name": ["John", "Jane", "Bob"],
    "age": [30, 25, 35],
    "city": ["New York", "London", "Paris"]
}

# convert JSON to DataFrame
df = pd.read_json(json.dumps(data))

# write DataFrame to CSV file
df.to_csv("data.csv", index=False)

# read CSV file
df = pd.read_csv("data.csv")

# print DataFrame
print(df)

Output:

   name  age       city
0  John   30   New York
1  Jane   25     London
2   Bob   35      Paris

In this example, we define a Python dictionary representing JSON data, and use json.dumps() to convert it to a JSON string. We then use pandas.read_json() to read the JSON string into a DataFrame and use DataFrame.to_csv() to write it to a CSV file. We then use pandas.read_csv() to read the CSV file back into a DataFrame and print it.

Note that when calling to_csv(), we pass index=False to exclude the row index from the output CSV file.

Optimizing performance for large JSON Files

If you need to handle large JSON files, an effective approach is streaming the JSON file. This lets you parse and process data incrementally without loading the entire file into memory.

Python libraries like ijson offer iterative parsing that deals with data on a piece-by-piece basis. You can also use preprocessing steps like data cleaning and validation to reduce computational overhead.

Parsing JSON cheat sheet

Parsing JSON strings in Python

  • To parse JSON strings in Python, use the json.loads() method from the built-in json module. This method converts a JSON string into a Python object.
  • If the input string is not valid JSON, json.loads() will raise a json.decoder.JSONDecodeError.

Reading and parsing JSON files in Python

  • To read and parse JSON files, use the json.load() method with a file object. This approach is similar to parsing a JSON string but operates directly on a file.

Pretty printing JSON data

  • For better readability, JSON data can be pretty printed using the json.dumps() method with the indent parameter, which formats the output with specified indentation.

Parsing JSON with Pandas

  • Pandas can also be used to parse and work with JSON data, offering a method called pd.read_json() that reads JSON data into a DataFrame. This is particularly useful for data analysis and manipulation due to the powerful features of DataFrames.

Manipulating DataFrames

  • After converting JSON to a DataFrame, pandas allows for extensive data manipulation capabilities such as selecting columns, filtering rows, and grouping data.

Converting JSON to CSV

  • Pandas facilitates converting JSON data to CSV format using DataFrame.to_csv() method after reading JSON into a DataFrame with pd.read_json(). This is useful for data exchange and storage in a more universally readable format.

More on data parsing

Discover other resources you need to understand data parsing and web scraping with Python or web scraping in general.

Percival Villalva
Percival Villalva
Developer Advocate on a mission to help developers build scalable, human-like bots for data extraction and web automation.

TAGS

Python JSON

Get started now

Step up your web scraping and automation