Export Datasets

Learn how to add data export and synchronization capabilities to your IDAH plugin using the Sync Service backend.

Overview

The Sync Service backend handles data export and synchronization with external systems. Export datasets, entries, annotations, and media to various formats (JSON, CSV, XML) or external APIs.

Add Sync Backend to Your Plugin

Option 1: Create New Plugin with Sync Backend

When creating a new plugin, select "Sync Service" during the setup:

npx idah-plugin create my-plugin ./plugins

When prompted, select Sync Service from the backend services options.

Option 2: Add Sync Backend to Existing Plugin

If you have an existing plugin without a sync backend, you can add it:

npx idah-plugin backend add my-plugin ./plugins

When prompted, select Sync Service to add export capabilities.

Generated File Structure

The sync backend generator creates the following files:

<plugin_name>/
   └─ backends/
      └─ sync/
         └─ <plugin_name_underscore>/
            ├─ sync.rb                                 # Sync service module (registers exporter)
            ├─ sync_spec.rb                            # Sync service tests
            ├─ export.rb                               # Export/sync logic
            └─ export_spec.rb                          # Export tests

Implementation Steps

Step 1: Register the Exporter

The sync module registers your exporter with IDAH:

backends/sync/<plugin_name_underscore>/sync.rb
module YourPlugin
  class Sync
    def self.init(context)
      context.register_exports(
        "your-plugin",      # Export identifier
        YourPlugin::Export  # Export class
      )
    end
  end
end

Step 2: Implement the Export Logic

Implement the core export logic in your export class:

backends/sync/<plugin_name_underscore>/export.rb
module YourPlugin
  class Export
    def export(context)
      # Get dataset IDs being exported
      dataset_ids = context.dataset_ids

      # Get export options
      options = context.options

      # Create output file
      file = context.io.file(format: "json")

      # Process datasets
      all_data = []
      context.datasets.each do |dataset|
        all_data << export_dataset(dataset, options)
      end

      # Write to file
      File.write(file.path, all_data.to_json)

      # Progress is auto-updated when using datasets iterator
    end

    private

    def export_dataset(dataset, options)
      # Your export logic here
      {
        id: dataset.record.id,
        name: dataset.record.name,
        entries: export_entries(dataset)
      }
    end

    def export_entries(dataset)
      dataset.entries.map do |entry|
        {
          id: entry.record.id,
          annotations: entry.annotations.map(&:annotation)
        }
      end
    end
  end
end

Export Context API

Context Attributes

context.dataset_ids   # Array of dataset IDs
context.options       # Export options hash
context.io            # IO context for file operations

Iterate Through Datasets

context.datasets.each do |dataset|
  # Access dataset data
  dataset_data = dataset.record
  dataset_id = dataset.record.id
  dataset_name = dataset.record.name

  # Get entries
  entries = dataset.entries

  # Get filtered entries
  completed_entries = dataset.entries({ status: "completed" })
end

Note: Progress is automatically updated as you iterate through datasets.

Access Entries and Annotations

dataset.entries.each do |entry|
  # Access entry data
  entry_id = entry.record.id
  resource = entry.record.resource
  status = entry.record.status

  # Get annotations
  annotations = entry.annotations

  # Get filtered annotations
  boxes = entry.annotations({ type: "bounding_box" })

  # Get media files
  medias = entry.medias
  original = entry.medias({ key: "" }).first
end

Download Media Files

entry.medias.each do |media|
  filename = media.media.filename
  mime_type = media.media.mime_type

  # Download media file
  binary_data = media.download

  # Save to export directory
  File.write(File.join(dir, filename), binary_data)
end

IO Operations

file = context.io.file(format: "json")
File.write(file.path, data.to_json)

dir = context.io.directory
File.write(File.join(dir, "data.json"), data.to_json)

zip_path = context.io.zip_directory

Implementation Workflow

1. Add Sync Backend (if not present)

npx idah-plugin backend add data-exporter ./plugins

Select "Sync Service" when prompted. This generates the sync backend structure.

2. Install Dependencies

cd plugins/data-exporter bundle install

3. Implement Export Logic

Edit backends/sync/<plugin_name>/export.rb:

backends/sync/<plugin_name_underscore>/export.rb
require "json"

module YourPlugin
  class Export
    def export(context)
      Verse.logger.info "Starting export for #{context.dataset_ids.size} datasets"

      # Create output file
      file = context.io.file(format: "json")

      # Export all datasets
      all_data = []
      context.datasets.each do |dataset|
        all_data << export_dataset(dataset)
      end

      # Write to file
      File.write(file.path, JSON.pretty_generate(all_data))

      Verse.logger.info "Export complete"
    rescue StandardError => e
      Verse.logger.error "Export failed: #{e.message}"
      context.error!(e.message)
      raise
    end

    private

    def export_dataset(dataset)
      {
        id: dataset.record.id,
        name: dataset.record.name,
        modality: dataset.record.modality,
        entries: export_entries(dataset)
      }
    end

    def export_entries(dataset)
      dataset.entries.map do |entry|
        {
          id: entry.record.id,
          resource: entry.record.resource,
          status: entry.record.status,
          annotations: entry.annotations.map { |a| export_annotation(a) }
        }
      end
    end

    def export_annotation(annotation)
      {
        id: annotation.record.id,
        dimensions: annotation.record.dimensions,
        shape: annotation.record.shape
      }
    end
  end
end

4. Write Tests

backends/sync/<plugin_name_underscore>/export_spec.rb
require "spec_helper"
require_relative "export"

RSpec.describe YourPlugin::Export do
  let(:export) { described_class.new }

  describe "#export" do
    it "exports datasets successfully" do
      context = double("context")
      io = double("io")
      file = double("file", path: "/tmp/export.json")

      allow(context).to receive(:dataset_ids).and_return(["ds1"])
      allow(context).to receive(:options).and_return({})
      allow(context).to receive(:io).and_return(io)
      allow(context).to receive(:datasets).and_return([])

      allow(io).to receive(:file).and_return(file)

      expect { export.export(context) }.not_to raise_error
    end
  end
end

5. Run Tests

bundle exec rspec backends/sync/

Data Context Objects

DatasetContext

context.datasets.each do |dataset|
  # Access dataset record
  dataset.record.id
  dataset.record.name
  dataset.record.modality

  # Get all entries
  entries = dataset.entries

  # Get filtered entries
  entries = dataset.entries({ status: "completed" })
end

EntryContext

dataset.entries.each do |entry|
  # Access entry record
  entry.record.id
  entry.record.resource
  entry.record.status

  # Get annotations
  annotations = entry.annotations
  annotations = entry.annotations({ type: "bounding_box" })

  # Get media files
  medias = entry.medias
  original = entry.medias({ key: "" }).first
end

AnnotationContext

entry.annotations.each do |annotation|
  annotation.record.id
  annotation.record.dimensions
  annotation.record.annotation
  annotation.record.metadata
end

MediaContext

entry.medias.each do |media|
  media.media.resource
  media.media.key
  media.media.filename
  media.media.mime_type

  # Download the file
  binary_data = media.download
  File.write(media.media.filename, binary_data)
end

Common Export Patterns

Export to Single JSON File

def export(context)
  file = context.io.file(format: "json")

  all_data = []
  context.datasets.each do |dataset|
    all_data << {
      dataset: dataset.record,
      entries: dataset.entries.map { |e| format_entry(e) }
    }
  end

  File.write(file.path, JSON.pretty_generate(all_data))
end

Export to Multiple Files

def export(context)
  dir = context.io.directory

  context.datasets.each_with_index do |dataset, idx|
    filename = "dataset_#{idx + 1}.json"
    data = export_dataset(dataset)
    File.write(File.join(dir, filename), data.to_json)
  end

  # Create summary
  summary = { total: context.dataset_ids.size }
  File.write(File.join(dir, "summary.json"), summary.to_json)

  # Zip everything
  zip_path = context.io.zip_directory
  Verse.logger.info "Export zipped to #{zip_path}"
end

Export to External API

require "net/http"
require "json"

def export(context)
  api_key = context.options[:api_key]
  api_url = context.options[:api_url]

  context.datasets.each do |dataset|
    data = prepare_dataset_data(dataset)
    send_to_api(data, api_url, api_key)
  end
end

def send_to_api(data, url, api_key)
  uri = URI(url)
  req = Net::HTTP::Post.new(uri)
  req["Authorization"] = "Bearer #{api_key}"
  req["Content-Type"] = "application/json"
  req.body = data.to_json

  Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) do |http|
    response = http.request(req)
    raise "API error: #{response.code}" unless response.is_a?(Net::HTTPSuccess)
  end
end

Export with Media Files

def export(context)
  dir = context.io.directory

  # Create media directory
  media_dir = File.join(dir, "media")
  Dir.mkdir(media_dir)

  context.datasets.each do |dataset|
    # Export data
    data = export_dataset_data(dataset)
    File.write(File.join(dir, "#{dataset.record.id}.json"), data.to_json)

    # Export media files
    dataset.entries.each do |entry|
      entry.medias.each do |media|
        media_path = File.join(media_dir, media.media.filename)
        File.write(media_path, media.download)
      end
    end
  end

  # Zip everything
  context.io.zip_directory
end

Best Practices

1. Handle Large Datasets

def export(context)
  file = context.io.file(format: "json")

  # Stream data to avoid memory issues
  File.open(file.path, "w") do |f|
    f.write("[")

    context.datasets.each_with_index do |dataset, idx|
      f.write(",") if idx > 0
      f.write(export_dataset(dataset).to_json)
    end

    f.write("]")
  end
end

2. Handle Errors Gracefully

def export(context)
  context.datasets.each do |dataset|
    begin
      export_dataset(dataset)
    rescue StandardError => e
      Verse.logger.warn "Failed to export #{dataset.record.id}: #{e.message}"
      # Continue or fail based on your requirements
    end
  end
end

3. Validate Options

def export(context)
  raise ArgumentError, "API key required" unless context.options[:api_key]
  raise ArgumentError, "No datasets to export" if context.dataset_ids.empty?

  # Proceed with export
end

4. Clean Up Resources

def export(context)
  temp_files = []

  begin
    # Create and process temp files
    temp_files << create_temp_file
    # ... processing
  ensure
    # Clean up
    temp_files.each { |f| File.unlink(f) if File.exist?(f) }
    context.io.cleanup
  end
end

Testing Your Sync Backend

Run Tests

bundle exec rspec backends/sync/
bundle exec rspec backends/sync/<plugin_name>/export_spec.rb

Test in IDAH Platform

  1. Build your plugin frontend: cd frontend && pnpm build
  2. Restart IDAH platform to load the plugin
  3. Create and annotate a dataset
  4. Trigger an export using your sync service
  5. Verify the exported data is correct
  6. Check logs for any errors or warnings

Real-World Example

See the UPD Exporter for a complete implementation:

📤 Ready to export data! Start by adding a sync backend to your plugin and implement your custom export logic.