Skip to main content

Set Automatic Conversation Ingestion via S3


Step 1 - Accessing the Data Import Page

Before setting up your S3 source, you need to open the Data Import page inside your account settings.

  1. At the bottom of the sidebar, locate your email address.
  2. Click the three-dot menu (⋮) next to it.
  3. Select "Settings".
  4. In the Settings menu, click "Data Import".

Once opened, you’ll see the Data Import Dashboard, which includes:

  • Daily Usage – shows how much data has been processed (e.g., “0 MB of 500 MB used today”).

  • S3 Sources – lists all configured S3 buckets for ingestion.

  • Add S3 Source – button to connect a new bucket for automatic ingestion.

  • About Data Import – explains ingestion behavior:

    • Files are automatically discovered and processed hourly.
    • Supported formats: JSON, JSONL, ZIP, TAR.GZ.
    • Maximum file size: 50 MB per file.
    • Daily ingestion limit: 500 MB.
    • Manual ingestion triggers have a 60-second cooldown. Notes:

    The Data Import page is available only from the Settings menu (under your user email). You can view total daily usage, add or remove S3 sources, and track ingestion activity from this page.

Step 2: Configure S3 Uploa

This step explains how to connect your S3 bucket so the platform can automatically import and process your conversation files.

  1. Create or select an existing S3 bucket in your AWS account.
  2. Store all your agent conversation files in that bucket.
  3. Share the bucket details with the Avon AI team (bucket name and access permissions).
  4. In the Avon AI platform, go to Settings → Data Import → Add S3 Source.
  5. Enter the full bucket path (e.g., my-bucket/ or my-bucket/conversations/).
  6. Click "Add Source" to connect your bucket.

Tip: Use a consistent prefix like "conversations/" to keep imported data organized.

Example bucket path: s3://my-bucket/conversations/

JSON Format Example (each file should follow this structure):


Notes:

  • Files must be in JSON or JSONL format.
  • Each file must include a valid "agent_id".
  • Maximum file size: 50 MB per file; daily limit: 500 MB.
  • Files are automatically detected and processed hourly.
  • To verify successful ingestion, return to the Data Import page to view usage and status.

Step 3: Configure Bucket Permissions & Verify Ingestion

After adding your S3 source in the platform, you must grant the Avon AI ingestion service permissions to read objects from your bucket/prefix. Then, drop a sample file to verify that ingestion runs successfully.

1. Grant S3 Permissions (IAM Policy on the Bucket)

  • Attach a bucket policy that allows read access to the specific prefix you configured.
  • Replace YOUR_BUCKET, YOUR_PREFIX, and INGESTION_AWS_PRINCIPAL with your actual values
  • with the ARN provided by Avon AI (ask Support/CSM if you don’t have it).
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowAvonAIListBucket",
"Effect": "Allow",
"Principal": { "AWS": "<INGESTION_AWS_PRINCIPAL>" },
"Action": ["s3:ListBucket"],
"Resource": "arn:aws:s3:::<YOUR_BUCKET>",
"Condition": {
"StringLike": {
"s3:prefix": ["<YOUR_PREFIX>/*"]
}
}
},
{
"Sid": "AllowAvonAIReadObjects",
"Effect": "Allow",
"Principal": { "AWS": "<INGESTION_AWS_PRINCIPAL>" },
"Action": ["s3:GetObject"],
"Resource": "arn:aws:s3:::<YOUR_BUCKET>/<YOUR_PREFIX>/*"
}
]
}

Tips:

  • Scope permissions to the exact prefix you use (e.g., conversations/).
  • If you use SSE-KMS, ensure the ingestion principal has decrypt permissions for the CMK.

2. (Optional) Restrict by Source VPC/Condition

If your security policy requires, add conditions like aws:SourceVpce or aws:PrincipalOrgID. Consult your security team for your environment-specific constraints.

3. Drop a Sample File to the Bucket

Upload a small JSON or JSONL file to: s3://<YOUR_BUCKET>/<YOUR_PREFIX>/

Use the validated conversation format from Step 2.

Example test object name:

s3://my-bucket/conversations/sample-conv-001.json

4. Verify Ingestion in the Platform

  • Go to Settings → Data Import.
  • Confirm the Daily Usage increases after the next hourly scan.
  • Check S3 Sources → last discovery time.
  • Navigate to AI Admin Panel → Conversation Analysis and search by conversation_id/agent_id.

5. Common Troubleshooting

No files detected:

  • Verify the bucket path/prefix matches exactly what you configured in “Add S3 Source”.
  • Ensure the file extension is .json or .jsonl (ZIP/TAR.GZ also supported).
  • Wait for the hourly discovery window (or trigger manual ingestion if available).

Access denied

  • Re-check the bucket policy Principal (INGESTION_AWS_PRINCIPAL) and Resources ARNs.
  • Confirm the prefix in the policy is correct (trailing slash).

Invalid JSON format

  • Re-run validation on a smaller sample (see manual upload recipe).
  • Ensure ISO-8601 timestamps and a valid agent_id.

Notes:

  • Max file size: 50 MB per file; daily limit: 500 MB.
  • Supported formats: JSON, JSONL, ZIP, TAR.GZ.
  • Keep a consistent folder structure (e.g., conversations/YYYY/MM/DD/) for easier auditing.

Step 4: Monitoring & Logs

Once your S3 ingestion is active, the platform automatically checks your bucket hourly.

  1. Go to Settings → Data Import.
  2. Under S3 Sources, check the “Last Discovery” timestamp.
  3. Review Daily Usage to monitor how much data has been processed today.
  4. If ingestion fails, you’ll see a status icon or error message (hover to view details).

You can also refresh ingestion manually by clicking Refresh. This triggers an on-demand scan (available once every 60 seconds).

Notes:

  • Each file’s status (discovered, ingested, failed) is logged internally.
  • Only new or updated files are re-processed on the next scan.
  • Use consistent naming patterns (e.g., conversations/YYYY/MM/DD/) to track ingestion progress.

Step 5: Retention & Lifecycle

To keep your storage clean and efficient, define lifecycle rules in your S3 bucket.

  1. In AWS S3, go to your bucket → Management → Lifecycle rules.
  2. Create a new rule named "AvonAI Data Retention".
  3. Example best practices:
  • Move old files (>30 days) to Glacier or Deep Archive.
  • Permanently delete files older than 90 days (optional).
  • Exclude prefixes still used for active ingestion.

Benefits:

  • Keeps costs low by archiving old conversation data.
  • Ensures Avon AI always processes the latest information.
  • Prevents clutter and duplicates in ingestion scans.

Notes:

  • Avon AI does not automatically delete S3 files — retention is managed in your AWS account.
  • Keep at least 7 days of history for troubleshooting and verification.

Step 6: De-duplication & Idempotency

Avon AI prevents duplicate conversations from being processed multiple times, but following good data hygiene practices ensures clean ingestion.

Best Practices:

  • Ensure each conversation file contains a unique "conversation_id".
  • Use stable IDs — if the same conversation is re-uploaded with the same ID, it will be ignored.
  • Avoid uploading the same file under multiple prefixes.
  • Use timestamps in filenames to make tracking easier (e.g., conv-2025-01-15-001.json).

Notes:

  • Avon AI performs internal idempotency checks per agent_id + conversation_id.
  • If duplicates are detected, only the latest valid record is retained.
  • For recurring ingestion, use versioned or timestamped folders to avoid overlap.

Step 7: Disable or Remove Source

If you need to temporarily pause or permanently remove an S3 source:

  1. Go to Settings → Data Import.
  2. Under S3 Sources, locate the source you wish to disable.
  3. Click the three-dot menu (⋮) next to it.
  4. Choose "Disable" to pause ingestion or "Remove" to delete the source.

Behavior:

  • Disabled sources will stop ingestion but remain listed for reactivation.
  • Removed sources will delete the configuration permanently (no data loss in S3).
  • You can add the same bucket again later using "Add S3 Source".

Notes:

  • Disabling is recommended if you need to pause ingestion for maintenance or testing.
  • Removing is recommended only when the bucket is no longer used or replaced.

Final Checklist — Before You Go Live

Run through this list to ensure your S3 ingestion is correctly configured and reliable.

  1. Agent Mapping
  • Confirm each file includes a valid "agent_id" that exists in the platform.
  • Spot-check 1–2 files to ensure the agent_id matches the intended agent.
  1. JSON Schema & Encoding
  • Validate your JSON/JSONL with a linter.
  • Verify ISO-8601 timestamps (YYYY-MM-DDTHH:MM:SSZ).
  • Ensure UTF-8 encoding and no stray BOM characters.
  1. S3 Pathing & Naming
  • Use a consistent prefix (e.g., conversations/YYYY/MM/DD/).
  • Filenames are unique and stable; include dates or IDs to avoid collisions.
  1. Permissions & Security
  • Bucket policy grants ListBucket (scoped to prefix) and GetObject to the ingestion principal.
  • (If used) KMS permissions allow decrypt for the same principal.
  • No write/delete permissions granted to the ingestion principal (read-only).
  • PII/Compliance: decide what to redact via missing_info and enable relevant validators.
  1. Data Import Configuration
  • Settings → Data Import → S3 Source is added and shows the correct bucket path.
  • Source status is “Enabled”; Last Discovery timestamp is recent.
  • Daily limits understood (e.g., 500 MB) and file size ≤ 50 MB.
  1. Dry Run / Sample Ingestion
  • Upload a small sample file to the configured prefix.
  • Wait for the hourly scan (or use Refresh if available).
  • Verify Daily Usage increases and the file is listed as discovered/ingested.
  1. Post-Ingestion Verification
  • Go to AI Admin Panel → Conversation Analysis.
  • Find the sample by conversation_id/agent_id and confirm messages, logs, and user_data render correctly.
  • Review any validation alerts or quality scores.
  1. De-duplication & Idempotency
  • Ensure each conversation has a unique conversation_id.
  • Re-upload test with the same ID to confirm the platform ignores duplicates (or behaves as expected).
  1. Monitoring & Alerts
  • Decide who monitors Data Import (owners, frequency).
  • (If available) Configure alerting for failures/timeouts.
  • Document a quick “what to check first” when ingestion fails.
  1. Retention & Lifecycle
  • S3 Lifecycle rules in place (archive/delete after X days).
  • Keep at least 7 days of history for troubleshooting.
  1. Disable/Remove Playbook
  • Know how to pause ingestion (Disable) during maintenance.
  • Know how to Remove the source safely without affecting existing S3 data.
  1. Go-live gate:
  • Sample files ingested and visible in Conversation Analysis.
  • No schema errors; validators behave as expected.
  • Team acknowledges limits, monitoring plan, and rollback steps.