ferrero-opentext/Python-Version/MARKDOWN_DOCS/EXTRACTION_GUIDE.md

400 lines
13 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Ferrero DAM Asset Metadata Extraction Guide
## Overview
This guide explains how to extract folder hierarchy, Global/Local status, and campaign relationships from Ferrero DAM asset JSON metadata files. This is critical for tracking the relationship between local assets and their global master campaigns.
## Problem Statement
When downloading a **LOCAL** asset from the Ferrero DAM, we need to identify and store the **GLOBAL master campaign** it came from in our database. This allows us to maintain the parent-child relationship between global masters and their local adaptations.
## JSON Structure Overview
The asset metadata JSON contains several key sections:
```json
{
"name": "asset_filename.jpg",
"asset_id": "unique_asset_id",
"metadata": {
"metadata_element_list": [
// Contains asset-level metadata including Global/Local status
]
},
"inherited_metadata_collections": [
// Contains folder/container metadata including campaign info
],
"path_list": [
// Contains folder hierarchy paths
]
}
```
## Key Extraction Points
### 1. Asset Global/Local Status
**Location**: `metadata.metadata_element_list[]`
**Field to find**: `FERRERO.FIELD.STATE` (named "Global/Local")
```json
{
"id": "FERRERO.FIELD.STATE",
"name": "Global/Local",
"value": {
"value": {
"field_value": {
"value": "GLOBAL" // or "LOCAL"
},
"display_value": "Global" // or "Local"
}
},
"domain_id": "FERRERO.DOMAIN.GLOBAL.LOCAL"
}
```
**Extraction Logic**:
```python
def get_asset_global_local_status(data):
"""Extract Global/Local status from asset metadata"""
metadata = data.get('metadata', {})
metadata_elements = metadata.get('metadata_element_list', [])
for category in metadata_elements:
for element in category.get('metadata_element_list', []):
if element.get('id') == 'FERRERO.FIELD.STATE':
value = element.get('value', {}).get('value', {})
return value.get('field_value', {}).get('value') # Returns "GLOBAL" or "LOCAL"
return None
```
### 2. Campaign Information
**Location**: `inherited_metadata_collections[]`
**Container Type to find**: `L7+ - CAMPAIGN`
Each asset can be associated with **multiple campaigns**. This is the KEY to understanding local/global relationships.
**Important Fields**:
- `FERRERO.FIELD.CAMPAIGN ID` - The campaign's unique ID (e.g., "C000000068")
- `FERRERO.FIELD.CAMPAIGN NAME` - The campaign's full name
- `FERRERO.FIELD.GLOBAL CAMPAIGN REFERENCE` - **THE KEY FIELD** - Points to the global master campaign ID
```json
{
"container_name": "CONTENT SCALING OLIVER TEST 3",
"container_type_name": "L7+ - CAMPAIGN",
"inherited_metadata_values": [
{
"id": "FERRERO.FIELD.CAMPAIGN ID",
"metadata_element": {
"id": "FERRERO.FIELD.CAMPAIGN ID",
"name": "Campaign ID",
"value": {
"value": {
"value": "C000000551"
}
}
}
},
{
"id": "FERRERO.FIELD.GLOBAL CAMPAIGN REFERENCE",
"metadata_element": {
"id": "FERRERO.FIELD.GLOBAL CAMPAIGN REFERENCE",
"name": "Global Campaign Reference",
"value": {
"value": {
"value": "C000000068" // <-- THIS IS THE GLOBAL MASTER!
}
}
}
}
]
}
```
**Extraction Logic**:
```python
def extract_campaign_info(data):
"""Extract all campaign information including global master reference"""
collections = data.get('inherited_metadata_collections', [])
campaigns = []
for collection in collections:
if collection.get('container_type_name') == 'L7+ - CAMPAIGN':
campaign_info = {
'container_name': collection.get('container_name'),
'campaign_id': None,
'campaign_name': None,
'global_campaign_reference': None, # <-- KEY FIELD
'campaign_type': None
}
inherited_metadata = collection.get('inherited_metadata_values', [])
for inherited in inherited_metadata:
metadata_element = inherited.get('metadata_element', {})
# Extract Campaign ID
if metadata_element.get('id') == 'FERRERO.FIELD.CAMPAIGN ID':
campaign_info['campaign_id'] = (
metadata_element.get('value', {})
.get('value', {})
.get('value')
)
# Extract Campaign Name
if metadata_element.get('id') == 'FERRERO.FIELD.CAMPAIGN NAME':
campaign_info['campaign_name'] = (
metadata_element.get('value', {})
.get('value', {})
.get('value')
)
# Extract Global Campaign Reference - THIS IS THE KEY!
if metadata_element.get('id') == 'FERRERO.FIELD.GLOBAL CAMPAIGN REFERENCE':
campaign_info['global_campaign_reference'] = (
metadata_element.get('value', {})
.get('value', {})
.get('value')
)
# Extract Campaign Type
if metadata_element.get('id') == 'FERRERO.FIELD.CAMPAIGN TYPE':
val_obj = metadata_element.get('value', {}).get('value', {})
campaign_info['campaign_type'] = (
val_obj.get('display_value') or
val_obj.get('field_value', {}).get('value')
)
campaigns.append(campaign_info)
return campaigns
```
### 3. Folder Hierarchy Paths
**Location**: `path_list[]`
Each asset can exist in multiple folder paths. Each path contains:
- `parents[]` - Array of folder objects from root to leaf
- `complete` - Boolean indicating if path is complete
- `tree_descriptor.tree_id` - Which tree the path belongs to
```json
{
"path_list": [
{
"parents": [
{
"id": "unique_folder_id",
"name": "05. Final Assets",
"container_state": "NORMAL"
},
{
"id": "unique_folder_id",
"name": "NUTELLA PLANT-BASED LAUNCH",
"container_state": "NORMAL"
},
{
"id": "unique_folder_id",
"name": "Global",
"container_state": "NORMAL"
}
],
"complete": true,
"tree_descriptor": {
"tree_id": "ARTESIA.PUBLIC.TREE"
}
}
]
}
```
**Extraction Logic**:
```python
def extract_folder_paths(data):
"""Extract all folder hierarchy paths"""
path_list = data.get('path_list', [])
paths = []
for path in path_list:
parents = path.get('parents', [])
folder_names = [parent.get('name') for parent in parents]
full_path = ' / '.join(folder_names)
paths.append({
'full_path': full_path,
'folders': folder_names,
'complete': path.get('complete', False),
'tree_id': path.get('tree_descriptor', {}).get('tree_id')
})
return paths
```
## Real-World Example
### Scenario: Local Asset with Global Master Reference
**Asset**: `nutella pbased.jpg`
- **Asset Global/Local Status**: `GLOBAL` (at asset level)
- **Associated with 2 Campaigns**:
#### Campaign 1: Local/Scaled Campaign
```
Container: CONTENT SCALING OLIVER TEST 3
Campaign ID: C000000551
Campaign Name: CONTENT_SCALING_OLI_JV_LA_DE_NUT_0000551
Global Campaign Reference: C000000068 ← Points to the global master!
```
#### Campaign 2: Global Master Campaign
```
Container: NUTELLA PLANT-BASED LAUNCH
Campaign ID: C000000068 ← This is the global master!
Campaign Name: GL_FY25_NUT_30_NUTELLA_PLANT_00068
Global Campaign Reference: (empty/not set)
```
### Understanding the Relationship
1. **Campaign C000000551** is a local/scaled campaign
2. Its `Global Campaign Reference` field contains **C000000068**
3. **Campaign C000000068** is the global master campaign
4. Therefore: When downloading an asset from C000000551, store C000000068 as the global_master_campaign_id
## Database Storage Recommendation
When storing asset metadata in your database:
```python
def process_asset_for_database(json_data):
"""Process asset JSON and prepare database record"""
# Extract basic asset info
asset_id = json_data.get('asset_id')
asset_name = json_data.get('name')
# Extract Global/Local status
global_local_status = get_asset_global_local_status(json_data)
# Extract campaign info
campaigns = extract_campaign_info(json_data)
# Extract folder paths
folder_paths = extract_folder_paths(json_data)
# Determine the global master campaign
global_master_campaign_id = None
local_campaign_id = None
for campaign in campaigns:
if campaign['global_campaign_reference']:
# This is a local campaign pointing to a global master
local_campaign_id = campaign['campaign_id']
global_master_campaign_id = campaign['global_campaign_reference']
elif not global_master_campaign_id:
# This might be the global master itself
# (only set if we haven't found a reference yet)
if global_master_campaign_id is None:
global_master_campaign_id = campaign['campaign_id']
# Prepare database record
db_record = {
'asset_id': asset_id,
'asset_name': asset_name,
'global_local_status': global_local_status,
'local_campaign_id': local_campaign_id,
'global_master_campaign_id': global_master_campaign_id, # KEY FIELD!
'folder_paths': [path['full_path'] for path in folder_paths],
'campaigns': campaigns # Store full campaign details as JSON
}
return db_record
```
## Database Schema Suggestion
```sql
CREATE TABLE dam_assets (
id SERIAL PRIMARY KEY,
asset_id VARCHAR(255) UNIQUE NOT NULL,
asset_name VARCHAR(500),
global_local_status VARCHAR(20), -- 'GLOBAL' or 'LOCAL'
-- Campaign tracking
local_campaign_id VARCHAR(50), -- The immediate campaign this asset belongs to
global_master_campaign_id VARCHAR(50), -- The global master campaign (KEY!)
-- Additional metadata
folder_paths JSONB, -- Array of folder paths
campaigns JSONB, -- Full campaign details
-- Timestamps
imported_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Index for finding all local assets of a global master
CREATE INDEX idx_global_master_campaign ON dam_assets(global_master_campaign_id);
-- Index for Global/Local status
CREATE INDEX idx_global_local_status ON dam_assets(global_local_status);
```
## Query Examples
### Find all local assets for a global master campaign:
```sql
SELECT * FROM dam_assets
WHERE global_master_campaign_id = 'C000000068'
AND local_campaign_id IS NOT NULL
AND local_campaign_id != global_master_campaign_id;
```
### Find the global master for a local asset:
```sql
SELECT master.* FROM dam_assets local
JOIN dam_assets master ON local.global_master_campaign_id = master.local_campaign_id
WHERE local.asset_id = 'specific_local_asset_id';
```
## Complete Extraction Script
See `extract_folder_hierarchy.py` for a complete, working implementation that extracts:
- Asset Global/Local status
- All campaign information including global master references
- Complete folder hierarchy paths
- Summary of asset relationships
**Usage**:
```bash
python3 extract_folder_hierarchy.py "asset_metadata.json"
```
## Key Takeaways
1.**Global/Local Status** is at the asset level (`FERRERO.FIELD.STATE`)
2.**Campaign Info** is in `inherited_metadata_collections[]` where `container_type_name == 'L7+ - CAMPAIGN'`
3.**Global Master Reference** is the `FERRERO.FIELD.GLOBAL CAMPAIGN REFERENCE` field
4. ✅ An asset can be in **multiple campaigns** - check all of them
5. ✅ The campaign with a `global_campaign_reference` is a local/scaled campaign
6. ✅ The campaign ID in `global_campaign_reference` is the global master you need to store
## Critical Fields Summary
| Field ID | Field Name | Location | Purpose |
|----------|-----------|----------|---------|
| `FERRERO.FIELD.STATE` | Global/Local | `metadata.metadata_element_list[]` | Asset's global/local designation |
| `FERRERO.FIELD.CAMPAIGN ID` | Campaign ID | `inherited_metadata_collections[]` | The campaign's unique identifier |
| `FERRERO.FIELD.GLOBAL CAMPAIGN REFERENCE` | Global Campaign Reference | `inherited_metadata_collections[]` | **THE KEY** - Points to global master |
| `FERRERO.FIELD.CAMPAIGN NAME` | Campaign Name | `inherited_metadata_collections[]` | Human-readable campaign name |
## Questions or Issues?
When implementing this extraction:
1. Always check for the presence of `FERRERO.FIELD.GLOBAL CAMPAIGN REFERENCE`
2. An asset may have multiple campaigns - iterate through all of them
3. The global master campaign itself won't have a `global_campaign_reference` field set
4. Store both the local campaign ID and the global master campaign ID for complete traceability