ferrero-opentext/Python-Version/MARKDOWN_DOCS/EXTRACTION_GUIDE.md

13 KiB
Raw Permalink Blame History

Ferrero DAM Asset Metadata Extraction Guide

Overview

This guide explains how to extract folder hierarchy, Global/Local status, and campaign relationships from Ferrero DAM asset JSON metadata files. This is critical for tracking the relationship between local assets and their global master campaigns.

Problem Statement

When downloading a LOCAL asset from the Ferrero DAM, we need to identify and store the GLOBAL master campaign it came from in our database. This allows us to maintain the parent-child relationship between global masters and their local adaptations.

JSON Structure Overview

The asset metadata JSON contains several key sections:

{
  "name": "asset_filename.jpg",
  "asset_id": "unique_asset_id",
  "metadata": {
    "metadata_element_list": [
      // Contains asset-level metadata including Global/Local status
    ]
  },
  "inherited_metadata_collections": [
    // Contains folder/container metadata including campaign info
  ],
  "path_list": [
    // Contains folder hierarchy paths
  ]
}

Key Extraction Points

1. Asset Global/Local Status

Location: metadata.metadata_element_list[]

Field to find: FERRERO.FIELD.STATE (named "Global/Local")

{
  "id": "FERRERO.FIELD.STATE",
  "name": "Global/Local",
  "value": {
    "value": {
      "field_value": {
        "value": "GLOBAL"  // or "LOCAL"
      },
      "display_value": "Global"  // or "Local"
    }
  },
  "domain_id": "FERRERO.DOMAIN.GLOBAL.LOCAL"
}

Extraction Logic:

def get_asset_global_local_status(data):
    """Extract Global/Local status from asset metadata"""
    metadata = data.get('metadata', {})
    metadata_elements = metadata.get('metadata_element_list', [])

    for category in metadata_elements:
        for element in category.get('metadata_element_list', []):
            if element.get('id') == 'FERRERO.FIELD.STATE':
                value = element.get('value', {}).get('value', {})
                return value.get('field_value', {}).get('value')  # Returns "GLOBAL" or "LOCAL"

    return None

2. Campaign Information

Location: inherited_metadata_collections[]

Container Type to find: L7+ - CAMPAIGN

Each asset can be associated with multiple campaigns. This is the KEY to understanding local/global relationships.

Important Fields:

  • FERRERO.FIELD.CAMPAIGN ID - The campaign's unique ID (e.g., "C000000068")
  • FERRERO.FIELD.CAMPAIGN NAME - The campaign's full name
  • FERRERO.FIELD.GLOBAL CAMPAIGN REFERENCE - THE KEY FIELD - Points to the global master campaign ID
{
  "container_name": "CONTENT SCALING OLIVER TEST 3",
  "container_type_name": "L7+ - CAMPAIGN",
  "inherited_metadata_values": [
    {
      "id": "FERRERO.FIELD.CAMPAIGN ID",
      "metadata_element": {
        "id": "FERRERO.FIELD.CAMPAIGN ID",
        "name": "Campaign ID",
        "value": {
          "value": {
            "value": "C000000551"
          }
        }
      }
    },
    {
      "id": "FERRERO.FIELD.GLOBAL CAMPAIGN REFERENCE",
      "metadata_element": {
        "id": "FERRERO.FIELD.GLOBAL CAMPAIGN REFERENCE",
        "name": "Global Campaign Reference",
        "value": {
          "value": {
            "value": "C000000068"  // <-- THIS IS THE GLOBAL MASTER!
          }
        }
      }
    }
  ]
}

Extraction Logic:

def extract_campaign_info(data):
    """Extract all campaign information including global master reference"""
    collections = data.get('inherited_metadata_collections', [])
    campaigns = []

    for collection in collections:
        if collection.get('container_type_name') == 'L7+ - CAMPAIGN':
            campaign_info = {
                'container_name': collection.get('container_name'),
                'campaign_id': None,
                'campaign_name': None,
                'global_campaign_reference': None,  # <-- KEY FIELD
                'campaign_type': None
            }

            inherited_metadata = collection.get('inherited_metadata_values', [])
            for inherited in inherited_metadata:
                metadata_element = inherited.get('metadata_element', {})

                # Extract Campaign ID
                if metadata_element.get('id') == 'FERRERO.FIELD.CAMPAIGN ID':
                    campaign_info['campaign_id'] = (
                        metadata_element.get('value', {})
                        .get('value', {})
                        .get('value')
                    )

                # Extract Campaign Name
                if metadata_element.get('id') == 'FERRERO.FIELD.CAMPAIGN NAME':
                    campaign_info['campaign_name'] = (
                        metadata_element.get('value', {})
                        .get('value', {})
                        .get('value')
                    )

                # Extract Global Campaign Reference - THIS IS THE KEY!
                if metadata_element.get('id') == 'FERRERO.FIELD.GLOBAL CAMPAIGN REFERENCE':
                    campaign_info['global_campaign_reference'] = (
                        metadata_element.get('value', {})
                        .get('value', {})
                        .get('value')
                    )

                # Extract Campaign Type
                if metadata_element.get('id') == 'FERRERO.FIELD.CAMPAIGN TYPE':
                    val_obj = metadata_element.get('value', {}).get('value', {})
                    campaign_info['campaign_type'] = (
                        val_obj.get('display_value') or
                        val_obj.get('field_value', {}).get('value')
                    )

            campaigns.append(campaign_info)

    return campaigns

3. Folder Hierarchy Paths

Location: path_list[]

Each asset can exist in multiple folder paths. Each path contains:

  • parents[] - Array of folder objects from root to leaf
  • complete - Boolean indicating if path is complete
  • tree_descriptor.tree_id - Which tree the path belongs to
{
  "path_list": [
    {
      "parents": [
        {
          "id": "unique_folder_id",
          "name": "05. Final Assets",
          "container_state": "NORMAL"
        },
        {
          "id": "unique_folder_id",
          "name": "NUTELLA PLANT-BASED LAUNCH",
          "container_state": "NORMAL"
        },
        {
          "id": "unique_folder_id",
          "name": "Global",
          "container_state": "NORMAL"
        }
      ],
      "complete": true,
      "tree_descriptor": {
        "tree_id": "ARTESIA.PUBLIC.TREE"
      }
    }
  ]
}

Extraction Logic:

def extract_folder_paths(data):
    """Extract all folder hierarchy paths"""
    path_list = data.get('path_list', [])
    paths = []

    for path in path_list:
        parents = path.get('parents', [])
        folder_names = [parent.get('name') for parent in parents]
        full_path = ' / '.join(folder_names)

        paths.append({
            'full_path': full_path,
            'folders': folder_names,
            'complete': path.get('complete', False),
            'tree_id': path.get('tree_descriptor', {}).get('tree_id')
        })

    return paths

Real-World Example

Scenario: Local Asset with Global Master Reference

Asset: nutella pbased.jpg

  • Asset Global/Local Status: GLOBAL (at asset level)
  • Associated with 2 Campaigns:

Campaign 1: Local/Scaled Campaign

Container: CONTENT SCALING OLIVER TEST 3
Campaign ID: C000000551
Campaign Name: CONTENT_SCALING_OLI_JV_LA_DE_NUT_0000551
Global Campaign Reference: C000000068  ← Points to the global master!

Campaign 2: Global Master Campaign

Container: NUTELLA PLANT-BASED LAUNCH
Campaign ID: C000000068  ← This is the global master!
Campaign Name: GL_FY25_NUT_30_NUTELLA_PLANT_00068
Global Campaign Reference: (empty/not set)

Understanding the Relationship

  1. Campaign C000000551 is a local/scaled campaign
  2. Its Global Campaign Reference field contains C000000068
  3. Campaign C000000068 is the global master campaign
  4. Therefore: When downloading an asset from C000000551, store C000000068 as the global_master_campaign_id

Database Storage Recommendation

When storing asset metadata in your database:

def process_asset_for_database(json_data):
    """Process asset JSON and prepare database record"""

    # Extract basic asset info
    asset_id = json_data.get('asset_id')
    asset_name = json_data.get('name')

    # Extract Global/Local status
    global_local_status = get_asset_global_local_status(json_data)

    # Extract campaign info
    campaigns = extract_campaign_info(json_data)

    # Extract folder paths
    folder_paths = extract_folder_paths(json_data)

    # Determine the global master campaign
    global_master_campaign_id = None
    local_campaign_id = None

    for campaign in campaigns:
        if campaign['global_campaign_reference']:
            # This is a local campaign pointing to a global master
            local_campaign_id = campaign['campaign_id']
            global_master_campaign_id = campaign['global_campaign_reference']
        elif not global_master_campaign_id:
            # This might be the global master itself
            # (only set if we haven't found a reference yet)
            if global_master_campaign_id is None:
                global_master_campaign_id = campaign['campaign_id']

    # Prepare database record
    db_record = {
        'asset_id': asset_id,
        'asset_name': asset_name,
        'global_local_status': global_local_status,
        'local_campaign_id': local_campaign_id,
        'global_master_campaign_id': global_master_campaign_id,  # KEY FIELD!
        'folder_paths': [path['full_path'] for path in folder_paths],
        'campaigns': campaigns  # Store full campaign details as JSON
    }

    return db_record

Database Schema Suggestion

CREATE TABLE dam_assets (
    id SERIAL PRIMARY KEY,
    asset_id VARCHAR(255) UNIQUE NOT NULL,
    asset_name VARCHAR(500),
    global_local_status VARCHAR(20), -- 'GLOBAL' or 'LOCAL'

    -- Campaign tracking
    local_campaign_id VARCHAR(50),  -- The immediate campaign this asset belongs to
    global_master_campaign_id VARCHAR(50),  -- The global master campaign (KEY!)

    -- Additional metadata
    folder_paths JSONB,  -- Array of folder paths
    campaigns JSONB,  -- Full campaign details

    -- Timestamps
    imported_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

-- Index for finding all local assets of a global master
CREATE INDEX idx_global_master_campaign ON dam_assets(global_master_campaign_id);

-- Index for Global/Local status
CREATE INDEX idx_global_local_status ON dam_assets(global_local_status);

Query Examples

Find all local assets for a global master campaign:

SELECT * FROM dam_assets
WHERE global_master_campaign_id = 'C000000068'
AND local_campaign_id IS NOT NULL
AND local_campaign_id != global_master_campaign_id;

Find the global master for a local asset:

SELECT master.* FROM dam_assets local
JOIN dam_assets master ON local.global_master_campaign_id = master.local_campaign_id
WHERE local.asset_id = 'specific_local_asset_id';

Complete Extraction Script

See extract_folder_hierarchy.py for a complete, working implementation that extracts:

  • Asset Global/Local status
  • All campaign information including global master references
  • Complete folder hierarchy paths
  • Summary of asset relationships

Usage:

python3 extract_folder_hierarchy.py "asset_metadata.json"

Key Takeaways

  1. Global/Local Status is at the asset level (FERRERO.FIELD.STATE)
  2. Campaign Info is in inherited_metadata_collections[] where container_type_name == 'L7+ - CAMPAIGN'
  3. Global Master Reference is the FERRERO.FIELD.GLOBAL CAMPAIGN REFERENCE field
  4. An asset can be in multiple campaigns - check all of them
  5. The campaign with a global_campaign_reference is a local/scaled campaign
  6. The campaign ID in global_campaign_reference is the global master you need to store

Critical Fields Summary

Field ID Field Name Location Purpose
FERRERO.FIELD.STATE Global/Local metadata.metadata_element_list[] Asset's global/local designation
FERRERO.FIELD.CAMPAIGN ID Campaign ID inherited_metadata_collections[] The campaign's unique identifier
FERRERO.FIELD.GLOBAL CAMPAIGN REFERENCE Global Campaign Reference inherited_metadata_collections[] THE KEY - Points to global master
FERRERO.FIELD.CAMPAIGN NAME Campaign Name inherited_metadata_collections[] Human-readable campaign name

Questions or Issues?

When implementing this extraction:

  1. Always check for the presence of FERRERO.FIELD.GLOBAL CAMPAIGN REFERENCE
  2. An asset may have multiple campaigns - iterate through all of them
  3. The global master campaign itself won't have a global_campaign_reference field set
  4. Store both the local campaign ID and the global master campaign ID for complete traceability