Skip to main content
// Step 1: Accessing the Data Import Page
// --------------------------------------
//
// Before setting up your S3 source, you need to open the Data Import page inside your account settings.
//
// 1) At the bottom of the sidebar, locate your email address.
// 2) Click the three-dot menu () next to it.
// 3) Select "Settings".
// 4) In the Settings menu, click "Data Import".
//
// Once opened, you’ll see the Data Import Dashboard, which includes:
//
// • Daily Usage – shows how much data has been processed (e.g., “0 MB of 500 MB used today”).
// • S3 Sources – lists all configured S3 buckets for ingestion.
// • Add S3 Source – button to connect a new bucket for automatic ingestion.
// • About Data Import – explains ingestion behavior:
// - Files are automatically discovered and processed hourly.
// - Supported formats: JSON, JSONL, ZIP, TAR.GZ.
// - Maximum file size: 50 MB per file.
// - Daily ingestion limit: 500 MB.
// - Manual ingestion triggers have a 60-second cooldown.
//
// Notes:
// • The Data Import page is available only from the Settings menu (under your user email).
// • You can view total daily usage, add or remove S3 sources, and track ingestion activity from this page.

// Step 2: Configure S3 Upload
// ----------------------------
//
// This step explains how to connect your S3 bucket so the platform can automatically
// import and process your conversation files.
//
// 1) Create or select an existing S3 bucket in your AWS account.
// 2) Store all your agent conversation files in that bucket.
// 3) Share the bucket details with the Avon AI team (bucket name and access permissions).
// 4) In the Avon AI platform, go to Settings → Data Import → Add S3 Source.
// 5) Enter the full bucket path (e.g., my-bucket/ or my-bucket/conversations/).
// 6) Click "Add Source" to connect your bucket.
//
// Tip: Use a consistent prefix like "conversations/" to keep imported data organized.
//
// Example bucket path:
// s3://my-bucket/conversations/
//
// JSON Format Example (each file should follow this structure):
//
// [
// {
// "conversation_id": "avon-ai-example-conv-001",
// "timestamp": "2025-01-15T14:30:00.000Z",
// "agent_id": "agent-avon-ai-001",
// "is_resolved": true,
// "csat_score": 4.5,
// "missing_info": {
// "reason": "customer_privacy_preference",
// "fields": ["phone_number"]
// },
// "user_data": {
// "account_type": "premium",
// "subscription_tier": "professional",
// "preferences": {
// "language": "en-US",
// "notifications": true
// },
// "last_activity": "2025-01-14T14:30:00.000Z"
// },
// "messages": [
// {
// "message_id": "msg-001",
// "conversation_id": "avon-ai-example-conv-001",
// "role": "customer",
// "sender_id": "user-001",
// "content": "Hello, I need help with my account billing. I see some charges I don't understand.",
// "timestamp": "2025-01-15T14:30:00.000Z",
// "logs": [
// {
// "action": "user_input",
// "source": "web_chat",
// "session_id": "sess-001"
// }
// ]
// },
// {
// "message_id": "msg-002",
// "conversation_id": "avon-ai-example-conv-001",
// "role": "agent",
// "sender_id": "agent-avon-ai-001",
// "content": "I'd be happy to help you understand your billing charges. Let me review your account details.",
// "timestamp": "2025-01-15T14:30:30.000Z",
// "logs": [
// {
// "action": "account_lookup",
// "status": "success",
// "duration_ms": 245,
// "records_found": 3
// }
// ]
// },
// {
// "message_id": "msg-003",
// "conversation_id": "avon-ai-example-conv-001",
// "role": "customer",
// "sender_id": "user-001",
// "content": "Thank you! That breakdown was very helpful.",
// "timestamp": "2025-01-15T14:32:00.000Z",
// "logs": [
// {
// "action": "satisfaction_indicated",
// "confidence": 0.95
// }
// ]
// }
// ]
// }
// ]
//
// Notes:
// • Files must be in JSON or JSONL format.
// • Each file must include a valid "agent_id".
// • Maximum file size: 50 MB per file; daily limit: 500 MB.
// • Files are automatically detected and processed hourly.
// • To verify successful ingestion, return to the Data Import page to view usage and status.

// Step 3: Configure Bucket Permissions & Verify Ingestion
// -------------------------------------------------------
//
// After adding your S3 source in the platform, you must grant the Avon AI
// ingestion service permissions to read objects from your bucket/prefix.
// Then, drop a sample file to verify that ingestion runs successfully.

// 1) Grant S3 Permissions (IAM Policy on the Bucket)
// --------------------------------------------------
// Attach a bucket policy that allows read access to the specific prefix you configured.
// Replace <YOUR_BUCKET> and <YOUR_PREFIX/> with your actual values, and <INGESTION_AWS_PRINCIPAL>
// with the ARN provided by Avon AI (ask Support/CSM if you don’t have it).

/*
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowAvonAIListBucket",
"Effect": "Allow",
"Principal": { "AWS": "<INGESTION_AWS_PRINCIPAL>" },
"Action": ["s3:ListBucket"],
"Resource": "arn:aws:s3:::<YOUR_BUCKET>",
"Condition": {
"StringLike": {
"s3:prefix": ["<YOUR_PREFIX>/*"]
}
}
},
{
"Sid": "AllowAvonAIReadObjects",
"Effect": "Allow",
"Principal": { "AWS": "<INGESTION_AWS_PRINCIPAL>" },
"Action": ["s3:GetObject"],
"Resource": "arn:aws:s3:::<YOUR_BUCKET>/<YOUR_PREFIX>/*"
}
]
}
*/

// Tips:
// • Scope permissions to the exact prefix you use (e.g., conversations/).
// • If you use SSE-KMS, ensure the ingestion principal has decrypt permissions for the CMK.

// 2) (Optional) Restrict by Source VPC/Condition
// ----------------------------------------------
// If your security policy requires, add conditions like aws:SourceVpce or aws:PrincipalOrgID.
// Consult your security team for your environment-specific constraints.

// 3) Drop a Sample File to the Bucket
// -----------------------------------
// Upload a small JSON or JSONL file to: s3://<YOUR_BUCKET>/<YOUR_PREFIX>/
// Use the validated conversation format from Step 2.

// Example test object name:
// s3://my-bucket/conversations/sample-conv-001.json

// 4) Verify Ingestion in the Platform
// -----------------------------------
// • Go to Settings → Data Import.
// • Confirm the Daily Usage increases after the next hourly scan.
// • Check S3 Sources → last discovery time.
// • Navigate to AI Admin Panel → Conversation Analysis and search by conversation_id/agent_id.

// 5) Common Troubleshooting
// -------------------------
// • No files detected:
// - Verify the bucket path/prefix matches exactly what you configured in “Add S3 Source”.
// - Ensure the file extension is .json or .jsonl (ZIP/TAR.GZ also supported).
// - Wait for the hourly discovery window (or trigger manual ingestion if available).
//
// • Access denied:
// - Re-check the bucket policy Principal (INGESTION_AWS_PRINCIPAL) and Resources ARNs.
// - Confirm the prefix in the policy is correct (trailing slash).
//
// • Invalid JSON format:
// - Re-run validation on a smaller sample (see manual upload recipe).
// - Ensure ISO-8601 timestamps and a valid agent_id.
//
// Notes:
// • Max file size: 50 MB per file; daily limit: 500 MB.
// • Supported formats: JSON, JSONL, ZIP, TAR.GZ.
// • Keep a consistent folder structure (e.g., conversations/YYYY/MM/DD/) for easier auditing.

// Step 4: Monitoring & Logs
// --------------------------
//
// Once your S3 ingestion is active, the platform automatically checks your bucket hourly.
//
// 1) Go to Settings → Data Import.
// 2) Under S3 Sources, check the “Last Discovery” timestamp.
// 3) Review Daily Usage to monitor how much data has been processed today.
// 4) If ingestion fails, you’ll see a status icon or error message (hover to view details).
//
// You can also refresh ingestion manually by clicking **Refresh**.
// This triggers an on-demand scan (available once every 60 seconds).
//
// Notes:
// • Each file’s status (discovered, ingested, failed) is logged internally.
// • Only new or updated files are re-processed on the next scan.
// • Use consistent naming patterns (e.g., conversations/YYYY/MM/DD/) to track ingestion progress.

// Step 5: Retention & Lifecycle
// ------------------------------
//
// To keep your storage clean and efficient, define lifecycle rules in your S3 bucket.
//
// 1) In AWS S3, go to your bucket → Management → Lifecycle rules.
// 2) Create a new rule named "AvonAI Data Retention".
// 3) Example best practices:
// - Move old files (>30 days) to Glacier or Deep Archive.
// - Permanently delete files older than 90 days (optional).
// - Exclude prefixes still used for active ingestion.
//
// Benefits:
// • Keeps costs low by archiving old conversation data.
// • Ensures Avon AI always processes the latest information.
// • Prevents clutter and duplicates in ingestion scans.
//
// Notes:
// • Avon AI does not automatically delete S3 files — retention is managed in your AWS account.
// • Keep at least 7 days of history for troubleshooting and verification.

// Step 6: De-duplication & Idempotency
// ------------------------------------
//
// Avon AI prevents duplicate conversations from being processed multiple times,
// but following good data hygiene practices ensures clean ingestion.
//
// Best Practices:
// • Ensure each conversation file contains a unique "conversation_id".
// • Use stable IDs — if the same conversation is re-uploaded with the same ID, it will be ignored.
// • Avoid uploading the same file under multiple prefixes.
// • Use timestamps in filenames to make tracking easier (e.g., conv-2025-01-15-001.json).
//
// Notes:
// • Avon AI performs internal idempotency checks per agent_id + conversation_id.
// • If duplicates are detected, only the latest valid record is retained.
// • For recurring ingestion, use versioned or timestamped folders to avoid overlap.

// Step 7: Disable or Remove Source
// --------------------------------
//
// If you need to temporarily pause or permanently remove an S3 source:
//
// 1) Go to Settings → Data Import.
// 2) Under S3 Sources, locate the source you wish to disable.
// 3) Click the three-dot menu () next to it.
// 4) Choose "Disable" to pause ingestion or "Remove" to delete the source.
//
// Behavior:
// • Disabled sources will stop ingestion but remain listed for reactivation.
// • Removed sources will delete the configuration permanently (no data loss in S3).
// • You can add the same bucket again later using "Add S3 Source".
//
// Notes:
// • Disabling is recommended if you need to pause ingestion for maintenance or testing.
// • Removing is recommended only when the bucket is no longer used or replaced.

// Final Checklist — Before You Go Live
// ------------------------------------
//
// Run through this list to ensure your S3 ingestion is correctly configured and reliable.

// 1) Agent Mapping
// • Confirm each file includes a valid "agent_id" that exists in the platform.
// • Spot-check 1–2 files to ensure the agent_id matches the intended agent.

// 2) JSON Schema & Encoding
// • Validate your JSON/JSONL with a linter.
// • Verify ISO-8601 timestamps (YYYY-MM-DDTHH:MM:SSZ).
// • Ensure UTF-8 encoding and no stray BOM characters.

// 3) S3 Pathing & Naming
// • Use a consistent prefix (e.g., conversations/YYYY/MM/DD/).
// • Filenames are unique and stable; include dates or IDs to avoid collisions.

// 4) Permissions & Security
// • Bucket policy grants ListBucket (scoped to prefix) and GetObject to the ingestion principal.
// • (If used) KMS permissions allow decrypt for the same principal.
// • No write/delete permissions granted to the ingestion principal (read-only).
// • PII/Compliance: decide what to redact via `missing_info` and enable relevant validators.

// 5) Data Import Configuration
// • Settings → Data Import → S3 Source is added and shows the correct bucket path.
// • Source status is “Enabled”; Last Discovery timestamp is recent.
// • Daily limits understood (e.g., 500 MB) and file size ≤ 50 MB.

// 6) Dry Run / Sample Ingestion
// • Upload a small sample file to the configured prefix.
// • Wait for the hourly scan (or use Refresh if available).
// • Verify Daily Usage increases and the file is listed as discovered/ingested.

// 7) Post-Ingestion Verification
// • Go to AI Admin Panel → Conversation Analysis.
// • Find the sample by conversation_id/agent_id and confirm messages, logs, and user_data render correctly.
// • Review any validation alerts or quality scores.

// 8) De-duplication & Idempotency
// • Ensure each conversation has a unique `conversation_id`.
// • Re-upload test with the same ID to confirm the platform ignores duplicates (or behaves as expected).

// 9) Monitoring & Alerts
// • Decide who monitors Data Import (owners, frequency).
// • (If available) Configure alerting for failures/timeouts.
// • Document a quick “what to check first” when ingestion fails.

// 10) Retention & Lifecycle
// • S3 Lifecycle rules in place (archive/delete after X days).
// • Keep at least 7 days of history for troubleshooting.

// 11) Disable/Remove Playbook
// • Know how to pause ingestion (Disable) during maintenance.
// • Know how to Remove the source safely without affecting existing S3 data.

// Go-live gate:
// • Sample files ingested and visible in Conversation Analysis.
// • No schema errors; validators behave as expected.
// • Team acknowledges limits, monitoring plan, and rollback steps.

{"success":true}

Step 1 - Accessing the Data Import Page

Step 2- Configure S3 Upload

JSON fornat

Step 3- Configure Bucket Permissions & Verify Ingestion

Step 4- Monitoring & Logs

Step 5- Retention & Lifecycle

Step 6- De-duplication & Idempotency

Step 7- Disable or Remove Source

Step 8 - Final Checklist