Development Workflows

Focus: Master the development workflows for different types of data applications, from dbt transformations to Python scripts and interactive Streamlit dashboards. The 5X IDE provides specialized development environments for the most common data development tasks. This guide covers practical workflows for building data models, applications, and dashboards with real examples and best practices.

Development EnvironmentThe IDE comes pre-configured with multiple Python versions, dbt environments, and development tools. No additional setup required for basic development tasks.

Python development

Python environment overview

The IDE comes pre-installed with multiple Python versions managed through pyenv, providing flexibility for different project requirements and dependency compatibility. Available Python versions:

Python 3.9.23 - Legacy support for older projects
Python 3.10.18 - Stable version with good package compatibility
Python 3.11.13 - Default version (set by PYENV_VERSION)
Python 3.12.11 - Latest stable with performance improvements
Python 3.13.4 - Cutting-edge features and optimizations

View installed versions:

ls /root/.pyenv/versions

Virtual environment management

Python virtual environments provide isolated dependency management for your projects, preventing conflicts between different project requirements. Create a virtual environment:

# Using Python 3.11.13 (default)
/root/.pyenv/versions/3.11.13/bin/python -m venv my_project_env

# Using specific Python version
/root/.pyenv/versions/3.10.18/bin/python -m venv legacy_project_env

Activate and manage environments:

# Activate environment
source my_project_env/bin/activate

# Verify active environment (should show your env path)
which python

# Deactivate when finished
deactivate

Dependency management best practices

Maintain project dependencies using requirements.txt files for reproducible environments across team members and deployment targets. Create requirements.txt:

# Core data processing
pandas==2.0.3
numpy==1.24.3

# API and web requests  
requests==2.31.0
urllib3==2.0.4

# Visualization
matplotlib==3.7.2
seaborn==0.12.2

# Development tools
jupyter==1.0.0
pytest==7.4.0

Install and manage dependencies:

# Activate environment first
source my_project_env/bin/activate

# Install from requirements file
pip install -r requirements.txt

# Install additional packages and update requirements
pip install scikit-learn==1.3.0
pip freeze > requirements.txt

Python development examples

Data processing script:

import pandas as pd
import numpy as np
from sqlalchemy import create_engine

def process_customer_data(connection_string):
    """Process customer data from warehouse"""
    engine = create_engine(connection_string)
    
    # Load data
    df = pd.read_sql("SELECT * FROM customers", engine)
    
    # Data transformations
    df['full_name'] = df['first_name'] + ' ' + df['last_name']
    df['customer_tier'] = pd.cut(df['total_spent'], 
                                bins=[0, 100, 500, 1000, float('inf')],
                                labels=['Bronze', 'Silver', 'Gold', 'Platinum'])
    
    # Save processed data
    df.to_sql('processed_customers', engine, if_exists='replace', index=False)
    
    return df

if __name__ == "__main__":
    # Your connection string here
    conn_str = "your_connection_string"
    result = process_customer_data(conn_str)
    print(f"Processed {len(result)} customers")

API integration example:

import requests
import json
from datetime import datetime

class DataAPI:
    def __init__(self, base_url, api_key):
        self.base_url = base_url
        self.headers = {'Authorization': f'Bearer {api_key}'}
    
    def fetch_data(self, endpoint, params=None):
        """Fetch data from API endpoint"""
        response = requests.get(
            f"{self.base_url}/{endpoint}",
            headers=self.headers,
            params=params
        )
        response.raise_for_status()
        return response.json()
    
    def process_and_save(self, endpoint, db_connection):
        """Fetch, process, and save data to database"""
        data = self.fetch_data(endpoint)
        
        # Process data
        df = pd.DataFrame(data)
        df['processed_at'] = datetime.now()
        
        # Save to database
        df.to_sql('api_data', db_connection, if_exists='append', index=False)
        
        return df

dbt development

dbt Power User extension (recommended approach)

The dbt Power User extension provides the most integrated development experience, automatically using your configured dbt settings from Settings → Credentials including version selection, database connections, and target configuration.

Key workflows:

Model execution

Run and test modelsExecute individual models, selections, or entire dbt projects with integrated test runner

Lineage visualization

Understand dependenciesInteractive dependency graphs showing upstream and downstream model relationships

Documentation

Generate docsCreate and view dbt documentation with integrated preview and automatic refresh

SQL compilation

Preview compiled SQLSee the actual SQL that will be executed before running models

Command-line dbt development

For users preferring terminal-based workflows, the IDE provides pre-configured dbt virtual environments for each supported version. Activate dbt environment:

# List available dbt environments
ls /root/.venv

# Activate specific dbt version
source /root/.venv/dbt-1.8/bin/activate

# Verify dbt installation
dbt --version

Available dbt versions:

dbt-1.6 (/root/.venv/dbt-1.6/) - Legacy support
dbt-1.7 (/root/.venv/dbt-1.7/) - Stable version
dbt-1.8 (/root/.venv/dbt-1.8/) - Current stable
dbt-1.9 (/root/.venv/dbt-1.9/) - Latest features

dbt development workflow

Common dbt commands:

# Navigate to your dbt project directory
cd /path/to/your/dbt/project

# Run entire project
dbt run

# Run specific models
dbt run --select staging.stg_customers+

# Test your models
dbt test

# Generate documentation
dbt docs generate
dbt docs serve

Model development example:

-- models/staging/stg_customers.sql
SELECT
    customer_id::int AS customer_id,
    LOWER(TRIM(email)) AS email,
    INITCAP(first_name) AS first_name,
    INITCAP(last_name) AS last_name,
    created_at::timestamp AS created_at,
    CASE 
        WHEN status = 'A' THEN 'active'
        WHEN status = 'I' THEN 'inactive'
        ELSE 'unknown'
    END AS status
FROM {{ source('crm', 'customers') }}
WHERE customer_id IS NOT NULL

Model testing:

# models/schema.yml
version: 2

models:
  - name: stg_customers
    description: "Cleaned customer data from CRM system"
    columns:
      - name: customer_id
        description: "Unique customer identifier"
        tests:
          - unique
          - not_null
      - name: email
        description: "Customer email address"
        tests:
          - not_null
          - unique

Lineage visualization

The IDE provides powerful lineage visualization capabilities that help you understand data flow and model dependencies throughout your dbt project.

To view lineage:

Open any dbt model file in the editor
Navigate to the Lineage tab in the IDE interface
Explore interactive dependency graphs showing:
- Upstream models and sources feeding into current model
- Downstream models consuming current model output
- Cross-project dependencies and external table references

Lineage features:

Interactive navigation - Click nodes to jump between related models
Dependency depth control - Adjust how many levels of dependencies to display
Impact analysis - Understand which models will be affected by changes
Visual debugging - Identify circular dependencies and optimization opportunities

Get Started

Core Features

Python development

Python environment overview

Virtual environment management

Dependency management best practices

Python development examples

dbt development

dbt Power User extension (recommended approach)

Model execution

Lineage visualization

Documentation

SQL compilation

Command-line dbt development

dbt development workflow

Lineage visualization

Get Started

Core Features

​Python development

​Python environment overview

​Virtual environment management

​Dependency management best practices

​Python development examples

​dbt development

​dbt Power User extension (recommended approach)

Model execution

Lineage visualization

Documentation

SQL compilation

​Command-line dbt development

​dbt development workflow

​Lineage visualization

Python development

Python environment overview

Virtual environment management

Dependency management best practices

Python development examples

dbt development

dbt Power User extension (recommended approach)

Command-line dbt development

dbt development workflow

Lineage visualization