initial

2025-11-05 00:24:05 +00:00
commit b8856c0660
1157 changed files with 26817 additions and 0 deletions
--- a/docs/ARC_V1_TASK_IDS_README.md
+++ b/docs/ARC_V1_TASK_IDS_README.md
@ -0,0 +1,118 @@
+# ARC-AGI Version 1 Task IDs
+
+## Summary
+
+This directory contains the official list of **800 task IDs** from the original ARC-AGI Version 1 dataset.
+
+- **Source**: [fchollet/ARC-AGI](https://github.com/fchollet/ARC-AGI) v1.0.2
+- **Training Tasks**: 400
+- **Evaluation Tasks**: 400
+- **Total**: 800 tasks
+
+## Files Generated
+
+1. **arc_v1_official_task_ids.json** - Complete structured JSON with all V1 task IDs
+2. **arc_v1_all_ids.txt** - Simple text file with all task IDs
+3. **arc_v1_training_ids.txt** - Training task IDs only (400 tasks)
+4. **arc_v1_evaluation_ids.txt** - Evaluation task IDs only (400 tasks)
+
+## Key Findings About Your Dataset
+
+Your local `arc_data` directory contains:
+- **Training**: 1,000 tasks (600 more than V1)
+- **Evaluation**: 120 tasks (280 fewer than V1)
+- **Total**: 1,120 tasks
+
+This indicates your dataset is **NOT the original V1** and likely contains:
+- Extended/augmented training data
+- Potentially a subset of evaluation data
+- Possibly a mix of V1 and newer tasks
+
+## How to Identify V1 Tasks in Your Dataset
+
+Use the task IDs in `arc_v1_official_task_ids.json` as a reference:
+
+```python
+import json
+
+# Load official V1 IDs
+with open('arc_v1_official_task_ids.json', 'r') as f:
+    v1_data = json.load(f)
+    v1_task_ids = set(v1_data['all_task_ids'])
+
+# Check if a task is V1
+task_id = "007bbfb7"
+is_v1 = task_id in v1_task_ids
+print(f"Task {task_id} is V1: {is_v1}")
+```
+
+## Sample V1 Training Task IDs
+
+```
+007bbfb7
+00d62c1b
+017c7c7b
+025d127b
+045e512c
+0520fde7
+05269061
+05f2a901
+06df4c85
+08ed6ac7
+```
+
+## Sample V1 Evaluation Task IDs
+
+```
+00576224
+009d5c81
+00dbd492
+03560426
+05a7bcf2
+0607ce86
+0692e18c
+070dd51e
+08573cc6
+0934a4d8
+```
+
+## Usage
+
+To tag your database with version information:
+
+```python
+import json
+import pymysql
+
+# Load V1 task IDs
+with open('arc_v1_official_task_ids.json', 'r') as f:
+    v1_data = json.load(f)
+    v1_task_ids = set(v1_data['all_task_ids'])
+
+# Connect to database
+connection = pymysql.connect(...)
+cursor = connection.cursor()
+
+# Add version column (if not exists)
+cursor.execute("ALTER TABLE arc_jsons ADD COLUMN version VARCHAR(10)")
+
+# Tag V1 tasks
+for task_id in v1_task_ids:
+    cursor.execute(
+        "UPDATE arc_jsons SET version = 'v1' WHERE id = %s",
+        (task_id,)
+    )
+
+# Tag non-V1 tasks as v2 or unknown
+cursor.execute(
+    "UPDATE arc_jsons SET version = 'v2_or_extended' WHERE version IS NULL"
+)
+
+connection.commit()
+```
+
+## References
+
+- [ARC-AGI Repository](https://github.com/fchollet/ARC-AGI)
+- [ARC Prize](https://arcprize.org/)
+- [Original Paper: On the Measure of Intelligence](https://arxiv.org/abs/1911.01547)