How to build an AI agent

A complete step-by-step guide to creating, publishing, and monetizing AI agents on the Apify platform

You’ll learn step-by-step how to create an AI agent - specifically, an Instagram analysis agent - using CrewAI and Apify and integrate it with LLMs and web scrapers. You’ll learn how to configure tools, define prompts, make your agent public on the Apify platform, and monetize it.

The key components required are:

  • Good prompts to guide the agent.
  • A powerful set of tools to interact with the external world.
  • A strong LLM to process and connect everything together.
  • An agentic framework to handle situations where the LLM doesn't behave as expected.
  • A platform to run the agent and make the solution publicly available and scalable.

What are AI agents on Apify?

AI agents are goal-oriented systems capable of making independent decisions. They interact with their environment using predefined tools and workflows to automate complex tasks.

On Apify, an AI agent is built as an Actor. These are serverless cloud programs used for web scraping, data processing, and AI deployment.

Originally, Apify was built to run scrapers in the cloud, interact with the web, and achieve predefined goals. Over time, we realized that LLMs could also follow predefined workflows. Instead of writing rigid scripts, we could dynamically define goals and equip the agent with the right tools. This is essentially how AI agents work.

Why use Apify for AI agents?

Apify provides solutions for any problem a developer might face when building an AI agent:

  • Serverless execution – No infrastructure headaches.
  • Stateful execution – Agents can have memory.
  • Monetization options – Easily charge for usage with developer-specified events.
  • Extensive tool ecosystem – Thousands of pre-existing tools (Actors) available.
  • Scalability and reliability – Built for production use.
  • Pre-integrated tools – Ready-to-use web scraping and automation capabilities.
📌
Note: If you’re not sure whether an AI agent is what you need, check out our article on when you need a workflow and when you need an agent

How to build an AI agent on Apify (step-by-step guide)

1. Define the use case

In this tutorial, we’ll build a social media analysis agent that analyzes Instagram posts based on user queries. The agent will use the Instagram Scraper Actor to extract data.

Example use case:

  • Input: "Analyze the last 10 posts from @openai and summarize AI trends."
  • Output: Trend analysis based on post content.

2. Define input and output

The input could be a simple URL for website content analysis, a JSON config, or a textual query from the user. The agent’s output can be either a textual response or in a structured format - both of these formats can be easily stored on the Apify platform.

Example input:

  • User query (e.g., "Analyze @openai posts for AI trends")
  • OpenAI model selection

Example output:

  • Textual response with insights
  • Data stored in an Apify Dataset
📌
Note: Agents can be equipped with memory to store information and context between conversations. For example, a personal news agent might be a good fit for using memory to save a user's preferences, previous interactions, and mistakes. An agent specialized in a single task - for example, a website content summary agent (which is our case) - won't need memory.
Building an AI agent on Apify
Instagram analysis agent on Apify

How the Instagram analysis agent works:

The agent takes a user query, runs Instagram analysis, and provides a helpful response to accomplish the task.

User query:

Analyze the posts of the @openai and @googledeepmind and summarize me current trends in the AI.

Agent’s response:

1. **@openai Posts Trends**:
   - Feature promotions for ChatGPT's interactive features (voice, video prompts).
   - Community engagement posts encouraging user stories.
   - Emphasis on creative applications (e.g., Sora).
   - Utilization of memes for broader appeal.
   - Educational outreach focusing on mental health support via AI.

2. **@googledeepmind Posts Trends**:
   - Launch of advanced AI models (Gemini 2.0, Veo 2).
   - Versatile applications in gaming, quantum computing, and research.
   - Collaboration with artists for diverse representations of AI.
   - Education through podcast series discussing ethical and practical AI.
   - Visualizing AI to enhance public understanding and accessibility.

**Current Trends in AI**:
- Interactivity and personalization of AI tools.
- Creative integration into workflows.
- Diverse applications across various fields.
- Community engagement and collaboration.
- Education on AI implications and responsible usage.

3. Define and integrate the agent

We’ll use the CrewAI framework to define our agent and integrate it with Apify.

CrewAI simplifies defining agents, tasks, and tools, making it ideal for this use case. Apify provides a Python CrewAI template that we’ll use as a starting point.

📌
You can find the full source code for this tutorial in the GitHub repository.

The template includes Actor definition, monetization helper functions, and an Instagram scraping tool using the Instagram Scraper Actor.


In addition to CrewAI, Apify supports other frameworks for building AI agents, including LangGraph, LangGraph.js, LlamaIndex, and Bee Framework Agent. You can explore these options and their templates in the Apify templates section.


4. Create an Apify Actor

Install the Apify CLI:

npm -g install apify-cli

Create a new Actor from the CrewAI template:

apify create agent-actor -t python-crewai

The template's file structure includes:

  • .actor/ – Contains the Actor definition:
    • .actor/actor.json – Actor definition
    • .actor/input_schema.json – Input schema
    • .actor/dataset_schema.json – Dataset output schema
    • .actor/pay_per_event.json – Monetization configuration
  • src/ – Source code:
    • main.py – Actor execution, agent, and task definition
    • tools.py – Tool implementations
    • models.py – Pydantic models for structured tool output
    • ppe_utils.py – Helper functions for monetization

5. Define input and output schema

This input schema lets the user define the task for the agent using the query field and specify which model to use using modelName.

Update .actor/input_schema.json to define the Actor's inputs (see input schema documentation for more details).

{
  "title": "CrewAI Agent Python",
  "type": "object",
  "schemaVersion": 1,
  "properties": {
    "query": {
      "title": "Query",
      "type": "string",
      "description": "Query for the agent.",
      "editor": "textfield",
      "prefill": "What is the total number of likes and the total number of comments for the latest 10 posts on the @openai Instagram account? From the 10 latest posts, show me the most popular one."
    },
    "modelName": {
      "title": "OpenAI model",
      "type": "string",
      "description": "The OpenAI model to use. Currently supported models are gpt-4o and gpt-4o-mini.",
      "enum": [
        "gpt-4o",
        "gpt-4o-mini"
      ],
      "default": "gpt-4o-mini",
      "prefill": "gpt-4o-mini"
    }
  },
  "required": ["query"]
}

After the Actor finishes its execution, it will display the query and response fields from the dataset in an overview table.

Define the output schema in .actor/dataset_schema.json (see dataset schema documentation for more details):

{
  "actorSpecification": 1,
  "views": {
    "overview": {
      "title": "Overview",
      "transformation": {
        "fields": ["query", "response"]
      },
      "display": {
        "component": "table",
        "properties": {
          "query": {
            "label": "Query",
            "format": "text"
          },
          "response": {
            "label": "Response",
            "format": "text"
          }
        }
      }
    }
  }
}

6. Choose the tools

We decided that the social media analysis agent will use the Instagram post scraper tool to get the post and analyze it.

This tool is already implemented in this template using Instagram Scraper.

The tool returns a structured output, as a Pydantic model, that is defined in src/models.py:

from pydantic import BaseModel, Field, RootModel

class InstagramPost(BaseModel):
    """Instagram Post Pydantic model."""

    url: str = Field(..., description='The URL of the post')
    likes_count: int = Field(..., description='The number of likes on the post', alias='likesCount')
    comments_count: int = Field(..., description='The number of comments on the post', alias='commentsCount')
    timestamp: str = Field(..., description='The timestamp when the post was published')
    caption: str | None = Field(None, description='The post caption')
    alt: str | None = Field(None, description='The post alt text')

class InstagramPosts(RootModel):
    """Root model for list of InstagramPosts."""

    root: list[InstagramPost]

The tool itself is defined in src/tools.py. It runs the Instagram Scraper Actor, retrieves the scraped posts from the run dataset, and returns them as a list of InstagramPost objects.

The most important part of the tool for the agent is the description and argument schema, which describes the tool and its inputs for the agent. It can be a deciding factor whether the agent successfully calls the tool or fails:

import os

from apify import Actor
from apify_client import ApifyClient
from crewai.tools import BaseTool
from crewai.utilities.converter import ValidationError
from pydantic import BaseModel, Field

from src.models import InstagramPost, InstagramPosts

class InstagramScraperInput(BaseModel):
    """Input schema for InstagramScraper tool."""

    handle: str = Field(..., description="Instagram handle of the profile to scrape (without the '@' symbol).")
    max_posts: int = Field(default=30, description='Maximum number of posts to scrape.')

class InstagramScraperTool(BaseTool):
    """Tool for scraping Instagram profile posts."""

    name: str = 'Instagram Profile Posts Scraper'
    description: str = 'Tool to scrape Instagram profile posts.'
    args_schema: type[BaseModel] = InstagramScraperInput

    def _run(self, handle: str, max_posts: int = 30) -> list[InstagramPost]:
        run_input = {
            'directUrls': [f'<https://www.instagram.com/{handle}/>'],
            'resultsLimit': max_posts,
            'resultsType': 'posts',
            'searchLimit': 1,
        }
        if not (token := os.getenv('APIFY_TOKEN')):
            raise ValueError('APIFY_TOKEN environment variable is missing!')

        apify_client = ApifyClient(token=token)
        if not (run := apify_client.actor('apify/instagram-scraper').call(run_input=run_input)):
            msg = 'Failed to start the Actor apify/instagram-scraper'
            raise RuntimeError(msg)

        dataset_id = run['defaultDatasetId']
        dataset_items: list[dict] = (apify_client.dataset(dataset_id).list_items()).items

        try:
            posts: InstagramPosts = InstagramPosts.model_validate(dataset_items)
        except ValidationError as e:
            Actor.log.warning('Received invalid dataset items: %s. Error: %s', dataset_items, e)
            raise RuntimeError('Received invalid dataset items.') from e
        else:
            return posts.root

7. Implement the agent

Now that the required tool is defined, we can start implementing the agent.

We need to first handle the Actor input, create an actual agent with a given role, then assign the agent a task based on the user query, put the agent into a one-man crew and finally execute the crew to perform the task.

The agent is defined in src/main.py.

First, we need to handle Actor input:

actor_input = await Actor.get_input()

query = actor_input.get('query')
model_name = actor_input.get('modelName', 'gpt-4o-mini')
if not query:
    msg = 'Missing "query" attribute in input!'
    raise ValueError(msg)

Define the agent with the tools available in tools.py:

tools = [InstagramScraperTool()]

agent = Agent(
    role='Social Media Analytics Expert',
    goal='Analyze and provide insights about social media profiles and content.',
    backstory='I am an expert social media analyst specializing in Instagram analysis. I help users understand social media data and extract meaningful insights from profiles and posts.',
    tools=tools,
    verbose=True,
    llm=model_name,
)

Create a task from the user query and a crew - a collaborative group of agents working together to achieve a set of tasks:

task = Task(
    description=query,
    expected_output='A helpful response to the user query.',
    agent=agent,
)

crew = Crew(agents=[agent], tasks=[task])

📌
Note: While total token usage is logged for informational purposes, the pricing is based on the actor-start and task-completed events, not on token usage.

Execute the crew, handle the response, and save the response to the Apify Dataset:

crew_output = crew.kickoff()
raw_response = crew_output.raw

# Log total token usage
Actor.log.info('Total tokens used by the model: %s', crew_output.token_usage.total_tokens)

await Actor.push_data(
    {
        'query': query,
        'response': raw_response,
    }
)
Actor.log.info('Pushed the data into the dataset!')

8. Run the Actor locally

Now test the agent locally using the apify run command with the specified Actor input and environment variables set before pushing the Actor to the Apify platform.

OPENAI_API_KEY="your-openai-api-key" apify run --input '{"query": "Analyze the posts of the last 2 posts from @openai and summarize me current trends in the AI.", "modelName": "gpt-4o-mini"}'

9. Push to Apify

The next step is to deploy the agent Actor to the Apify platform - the Actor is not public after pushing: it needs to be explicitly published.

From the agent-actor directory:

apify push

In the Actor detail page (Source > Code), set the OPENAI_API_KEY environment variable as a secret and rebuild the Actor. See the guide on setting environment variables for detailed instructions.

Pushing an AI agent to Apify - OpenAI API key
Deploying an AI agent to Apify

10. Test the agent

Run the agent with this sample query:

Analyze the posts of the @openai and @googledeepmind and summarize me current trends in the AI.

Result stored in an Apify dataset:

AI agent output on the Apify platform
AI agent output on the Apify platform
🏹
Troubleshooting

The agent may fail to call the tool, or the scraper itself may fail to scrape Instagram. In that case, be sure to check the Actor run logs or runs triggered on your Apify account to see if the Instagram scraper was started and finished successfully. You can do that here.

We’ve shown you how to build an AI agent and deploy it to the Apify platform. But there’s more you can do. You can also make money from the agents you publish on Apify Store. That’s what we’ll go through now.

How to monetize your AI agent

Apify’s pay-per-event (PPE) pricing model allows you to charge users based on specific events triggered by your agent (e.g., Actor start, task completed, token usage, …) through the API or SDKs (JS/Python).

Unlike traditional platform usage billing (e.g., compute units or storage), PPE lets you define custom events that align with your Actor, giving you more control over the monetization.

1. Define what the agent will charge for

With PPE, you define the event that the user will be charged for and set its price. For example:

  • Actor start: Charge when the Actor begins running, based on resources like memory.
  • Task completion: Charge when the agent finishes analyzing Instagram posts.
  • Custom events: Charge for specific actions, like calling an external API or processing a certain number of posts.

Users pay for these events. The flexibility of PPE is ideal for Actors like AI agents where the output matters, not the runtime.

Here’s a more specific example. The event below marks the completion of a task - in this case, a summary of website content:

{
  "task-completed": {
      "eventTitle": "Task completed",
      "eventDescription": "Flat fee for each completed task.",
      "eventPriceUsd": 0.4
  }
}

2. Charge for events in code

Once the event is defined, you can charge for the event in your code:

# Charging for 5 tasks completed (total price will be 0.4 * 5 = 2$)
await Actor.charge('task-completed', count=5)

When the Actor.charge() function is called, the user running the Actor will be charged for the specified event.

3. Enable PPE in the Actor settings

Choose the pay-per-event monetization model in the Actor's monetization settings (follow the pay-per-event pricing model documentation) and define the events from the pay_per_event.json file if you want to charge on a per task completed basis.

Feel free to define custom events based on your needs. Example:

{
    "actor-start": {
        "eventTitle": "Actor start",
        "eventDescription": "Flat fee for starting an Actor run.",
        "eventPriceUsd": 0.1
    },
    "task-completed": {
        "eventTitle": "Price for completing the task",
        "eventDescription": "Flat fee for completing the task.",
        "eventPriceUsd": 0.4
    }
}

To charge for the Actor start:

# Charge for Actor start
await Actor.charge('actor-start')

# Handle input
actor_input = await Actor.get_input()

To charge for the task completed after the response is generated:

crew_output = crew.kickoff()
raw_response = crew_output.raw

# Log total token usage
Actor.log.info('Total tokens used by the model: %s', crew_output.token_usage.total_tokens)

# Charge for task completion
await Actor.charge("task-completed")

4. Publish the agent

Now that you’ve set the pricing, you’re ready to publish your Actor. Here’s a helpful checklist before you do that:

Once published, don’t forget to maintain your Actor and check for issues.

Ready to build your AI agent?

With Apify and CrewAI, you have everything you need to turn your ideas into powerful, scalable solutions - no server management required!

Here are a few pointers to get you started on creating your first agent:

  • Start with the CrewAI template: Jump straight into action with the Python CrewAI template. It's pre-configured with the essentials—like Instagram scraping and monetization helpers—so you can focus on crafting your agent's unique abilities. Run apify create agent-actor -t python-crewai and start experimenting today!
  • Try other AI templates: Not feeling the CrewAI vibe? Explore a variety of AI-focused templates on the Apify templates page. From LangGraph to LlamaIndex setups, there's a starting point for every vision.
  • Explore existing AI agents: Need inspiration or a ready-made solution? Check out the AI Agents collection on Apify Store. Test drive some LangGraph-based agents, see how others have tackled real-world problems, and adapt their approaches to your own projects.
  • Publish and profit: Once your agent is ready, push it to the Apify platform with apify push and enable monetization. Turn your hard work into a revenue stream. Whether it's for personal projects or a side hustle, Apify makes it easy to share and earn.

More about AI agents

On this page

Build the scraper you want

No credit card required

Start building