Skip to main content

Snowflake Documentation

Secured data sharing

Snowflake provides the secure sharing option within the same region with strong security. Unravel leverages the Snowflake data sharing with secure data sharing option for data transfer over shared connections outside the Virtual Private Cloud (VPC) or through private links. This involves employing a serverless/warehouse task to independently replicate metadata from the schema in the customer Snowflake account, allowing you to share metadata directly with the Unravel Snowflake account in the same region. The data is polled and shared every 1 hour by default. The polling can be configured.

For secure data share from Snowflake, complete the prerequisites and configure the Secure data share option.

The solution provides the following benefits:

  • No requirement for a direct connection to Snowflake

  • Rapid setup and reduced complexity for security permissions

  • Availability of all Unravel features with configurable latency

Architecture

Unravel Snowflake Architecture - Secure Share

The Unravel Snowflake Architecture - Secure share is a comprehensive framework designed to integrate with the Snowflake data warehouse, providing users with robust data analytics and management capabilities. The architecture is divided into three main layers: User Layer, Core Layer, and Data Source Layer, each serving distinct purposes and functionalities.

The architecture emphasizes secure communication pathways, including external connections, HTTP/HTTPS traffic, and internal communications, ensuring data integrity and confidentiality. It provides a scalable and efficient platform for data analytics, capable of integrating with various external systems and notifying users through multiple channels

  • User Layer: This layer interfaces directly with users and clients, offering web application access, REST API interactions, and external notifications through platforms such as Email, Slack, and Microsoft Teams. It ensures secure and authenticated access using username/password for web users and revocable tokens for API clients.

  • Core Layer: Deployed on virtual machines or cloud services like AWS, Azure, and SaaS, this layer includes critical components such as NGINX for load balancing, a notification service, various analytical and service modules, and a datastore. The Core Layer facilitates seamless data management and insight generation through components like the Intelligent Insight Generator (IIG) and Snowflake Worker. It supports development environments like Python and Java and runs on CentOS.

  • Data Source Layer: This layer securely connects to the Snowflake data warehouse using a Snowflake-JDBC secure link with local authentication. It handles both shared and customer-specific data, ensuring that data operations are performed within the same region (us-east-1) to optimize performance and reduce latency.

Secure Data Sharing Process
Snowflake-architecture.png

The secure data sharing process in this architecture follows these steps:

  • Authentication: Users authenticate using SAML-based SSO or local credentials. API clients use revocable tokens, while messaging and email systems connect by secure webhooks or SMTP.

  • User Request Initiation: Users or applications request data or analytics via the Unravel SaaS web interface or the REST API.

  • Request Handling: The Snowflake Worker, assisted by the Intelligent Insight Generator, processes these requests and determines the data requirements.

  • Establish Secure Connection: The Snowflake Worker creates a secure, encrypted JDBC connection to the Snowflake environment, using keypair authentication.

  • Initiate Secure Share: The Snowflake environment in the Unravel VPC securely shares required data with customer accounts in the Customer VPC. This is accomplished through Snowflake’s Secure Share feature, providing read-only, reference-based data access.

  • Multi-Account Access: Multiple customer accounts, distributed across regions, are able to securely access the same shared data while maintaining their own governance and access controls.

  • Result Delivery and Notification: Processed insights and results are delivered to users through their chosen channels, such as web, API, email, Slack, or Teams.

Prerequisites for secured Data sharing

The following prerequisites must be fulfilled before you use the secured data-sharing option:

  • Assign the ACCOUNTADMIN role to the user.

  • Grant SELECT permission on the schema SNOWFLAKE.ACCOUNT_USAGE.

Contact Unravel Support for Unravel account ID corresponding to the platform, region, and edition.

Secure Data Sharing Configuration Options

Unravel provides two methods for secure data sharing between the customer and Unravel. Choose one secure sharing option based on your business needs.

  • Health Check (POC): One-time data share from the customer to Unravel.

    Select Health Check for a one-time analysis of your Snowflake environment. It is ideal for quick assessments to optimize cost and performance without setting up continuous data flow.

  • Snow-Share Live: Continuous data sharing from the customer to Unravel.

    Select Snow-Share Live for ongoing monitoring and continuous performance insights. It is best suited for organizations that require automated updates and continuous optimization of Snowflake workloads.

Health Check: One-Time Data Share

Use the Health Check option to perform a one-time secure data transfer from your Snowflake environment to Unravel. This method is typically used for a snapshot of your data at a particular point in time for analysis and recommendations.

To configure the Health Check option:

  1. Keep the following information ready:

    • Unravel account ID (provided by Unravel based on your Snowflake region). You must replace it with the correct <Unravel Account ID> based on your Snowflake region and account type. Contact Unravel Support to obtain the appropriate Account ID for your region.

    • Warehouse name (created in your Snowflake account). You must replace SET WAREHOUSE_NAME = '<WAREHOUSE_NAME>'; with the created warehouse name.

  2. Update <WAREHOUSE_NAME> and <Unravel Account ID> with the information from step 1 in this link.

  3. You can also update the following optional parameters if required. The default values will be applicable if these are not updated.

    • DATABASE_TO_SHARE: Name of the database to share.

    • SCHEMA_TO_SHARE: Name of the schema to share.

    • SHARE_NAME: Name of the secure share to create.

    • PROFILE_QUERY_CREDIT: Minimum cost a query must incur to qualify for query profile statistics.

    • R_DAYS: Number of days of real-time queries to poll.

    • H_DAYS: Number of days of historical queries to poll.

  4. Execute the update SQL script provided in Step 2 to create the required procedures and functions.

  5. Configure and schedule continuous polling and data retention tasks.

    After completing Step 4, execute the following steps to retrieve the agreed-upon duration (e.g., 35 days) and continuously collect the data.

    The parameters that you configured in Step 3 are applied in Step 5. If you configured any of the following optional parameters in Step 3, specify these parameters again here.

    • DATABASE_TO_SHARE: Name of the database to share.

    • SCHEMA_TO_SHARE: Name of the schema to share.

    SET DATABASE_TO_SHARE = 'UNRAVEL_DB_SHARE';
    SET SCHEMA_TO_SHARE = 'UNRAVEL_SCHEMA_SHARE';
    USE IDENTIFIER($DATABASE_TO_SHARE);
    USE SCHEMA IDENTIFIER($SCHEMA_TO_SHARE);
    
    /**
    Step-1 (One time execution for POV for X(35) days)
    */
    CALL create_table_from_snowflake((SELECT VALUE FROM config_parameters where CONFIG_ID = 'DATABASE_TO_SHARE'), (SELECT VALUE FROM config_parameters where CONFIG_ID = 'SCHEMA_TO_SHARE'), 'QUERY_HISTORY');
    
    CALL create_table_from_snowflake((SELECT VALUE FROM config_parameters where CONFIG_ID = 'DATABASE_TO_SHARE'), (SELECT VALUE FROM config_parameters where CONFIG_ID = 'SCHEMA_TO_SHARE') , 'ACCESS_HISTORY');
    
    CALL CREATE_TABLES((SELECT VALUE FROM config_parameters where CONFIG_ID = 'DATABASE_TO_SHARE'), (SELECT VALUE FROM config_parameters where CONFIG_ID = 'SCHEMA_TO_SHARE'));
    
    CALL REPLICATE_ACCOUNT_USAGE((SELECT VALUE FROM config_parameters where CONFIG_ID = 'DATABASE_TO_SHARE'), (SELECT VALUE FROM config_parameters where CONFIG_ID = 'SCHEMA_TO_SHARE'), '35');
    
    CALL REPLICATE_STORAGE_METADATA((SELECT VALUE FROM config_parameters where CONFIG_ID = 'DATABASE_TO_SHARE'), (SELECT VALUE FROM config_parameters where CONFIG_ID = 'SCHEMA_TO_SHARE'),'35');
    
    CALL REPLICATE_HISTORY_QUERY((SELECT VALUE FROM config_parameters where CONFIG_ID = 'DATABASE_TO_SHARE'), (SELECT VALUE FROM config_parameters where CONFIG_ID = 'SCHEMA_TO_SHARE'), '35');
    
    CALL WAREHOUSE_PROC((SELECT VALUE FROM config_parameters where CONFIG_ID = 'DATABASE_TO_SHARE'), (SELECT VALUE FROM config_parameters where CONFIG_ID = 'SCHEMA_TO_SHARE'));
    
    CALL CREATE_QUERY_PROFILE(dbname => (SELECT VALUE FROM config_parameters where CONFIG_ID = 'DATABASE_TO_SHARE'), schemaname =>  (SELECT VALUE FROM config_parameters where CONFIG_ID = 'SCHEMA_TO_SHARE'), credit => (SELECT VALUE FROM config_parameters where CONFIG_ID = 'PROFILE_QUERY_CREDIT'), days => '14');
    
    /**
    Select and run REPLICATE_REALTIME_QUERY_BY_WAREHOUSE procedure if you wish to get real-time queries by warehouse name.It will select a maximum of 10,000 real-time queries for each warehouse at intervals of 1 hours.
    */
    
    CALL REPLICATE_REALTIME_QUERY_BY_WAREHOUSE((SELECT VALUE FROM config_parameters where CONFIG_ID = 'DATABASE_TO_SHARE'), (SELECT VALUE FROM config_parameters where CONFIG_ID = 'SCHEMA_TO_SHARE'), (SELECT VALUE FROM config_parameters where CONFIG_ID = 'R_DAYS'));
    
    CALL create_shared_db_metadata((SELECT VALUE FROM config_parameters where CONFIG_ID = 'DATABASE_TO_SHARE'), (SELECT VALUE FROM config_parameters where CONFIG_ID = 'SCHEMA_TO_SHARE'));
    
    
    /**
      Step-2 (Data sharing with Unravel account)
    */
    CALL SHARE_TO_ACCOUNT((SELECT VALUE FROM config_parameters where CONFIG_ID = 'ACCOUNT_ID'), (SELECT VALUE FROM config_parameters where CONFIG_ID = 'SHARE_NAME'), (SELECT VALUE FROM config_parameters where CONFIG_ID = 'DATABASE_TO_SHARE'), (SELECT VALUE FROM config_parameters where CONFIG_ID = 'SCHEMA_TO_SHARE'));
Snow-Share Live: Continuous Data Share

Use the Snow-Share Live option to set up continuous, secure data sharing between your Snowflake environment and Unravel.

To configure the snow-share live option:

  1. Keep the following information ready:

    • Unravel account ID (provided by Unravel based on your Snowflake region). You must replace it with the correct <Unravel Account ID> based on your Snowflake region and account type. Contact Unravel Support to obtain the appropriate Account ID for your region.

    • Warehouse name (created in your Snowflake account). You must replace SET WAREHOUSE_NAME = '<WAREHOUSE_NAME>'; with the created warehouse name.

  2. Update <WAREHOUSE_NAME> and <Unravel Account ID> with the information from step 1 in this link.

  3. You can also update the following optional parameters if required. The default values will be applicable if these are not updated.

    • DATABASE_TO_SHARE: Name of the database to share.

    • SCHEMA_TO_SHARE: Name of the schema to share.

    • SHARE_NAME: Name of the secure share to create.

    • PROFILE_QUERY_CREDIT: Minimum cost a query must incur to qualify for query profile statistics.

    • R_DAYS: Number of days of real-time queries to poll.

    • H_DAYS: Number of days of historical queries to poll.

  4. Execute the update SQL script provided in Step 2 to create the required procedures and functions.

  5. Configure and schedule continuous polling and data retention tasks.

    After completing Step 4, execute the following steps to retrieve the agreed-upon duration (e.g., 180 days) and continuously collect the data.

    The parameters that you configured in Step 3 are applied in Step 5. If you configured any of the following optional parameters in Step 3, specify these parameters again here.

    • DATABASE_TO_SHARE: Name of the database to share.

    • SCHEMA_TO_SHARE: Name of the schema to share.

    SET DATABASE_TO_SHARE = 'UNRAVEL_DB_SHARE';
    SET SCHEMA_TO_SHARE = 'UNRAVEL_SCHEMA_SHARE';
    USE IDENTIFIER($DATABASE_TO_SHARE);
    USE SCHEMA IDENTIFIER($SCHEMA_TO_SHARE);
    
    /**
    Step-1 (One time execution for POV for X(180) days)
    */
    CALL create_table_from_snowflake((SELECT VALUE FROM config_parameters where CONFIG_ID = 'DATABASE_TO_SHARE'), (SELECT VALUE FROM config_parameters where CONFIG_ID = 'SCHEMA_TO_SHARE'), 'QUERY_HISTORY');
    
    CALL create_table_from_snowflake((SELECT VALUE FROM config_parameters where CONFIG_ID = 'DATABASE_TO_SHARE'), (SELECT VALUE FROM config_parameters where CONFIG_ID = 'SCHEMA_TO_SHARE') , 'ACCESS_HISTORY');
    
    CALL CREATE_TABLES((SELECT VALUE FROM config_parameters where CONFIG_ID = 'DATABASE_TO_SHARE'), (SELECT VALUE FROM config_parameters where CONFIG_ID = 'SCHEMA_TO_SHARE'));
    
    CALL REPLICATE_ACCOUNT_USAGE((SELECT VALUE FROM config_parameters where CONFIG_ID = 'DATABASE_TO_SHARE'), (SELECT VALUE FROM config_parameters where CONFIG_ID = 'SCHEMA_TO_SHARE'), '180');
    
    CALL REPLICATE_STORAGE_METADATA((SELECT VALUE FROM config_parameters where CONFIG_ID = 'DATABASE_TO_SHARE'), (SELECT VALUE FROM config_parameters where CONFIG_ID = 'SCHEMA_TO_SHARE'),'180');
    
    CALL REPLICATE_HISTORY_QUERY((SELECT VALUE FROM config_parameters where CONFIG_ID = 'DATABASE_TO_SHARE'), (SELECT VALUE FROM config_parameters where CONFIG_ID = 'SCHEMA_TO_SHARE'), '180');
    
    CALL WAREHOUSE_PROC((SELECT VALUE FROM config_parameters where CONFIG_ID = 'DATABASE_TO_SHARE'), (SELECT VALUE FROM config_parameters where CONFIG_ID = 'SCHEMA_TO_SHARE'));
    
    CALL CREATE_QUERY_PROFILE(dbname => (SELECT VALUE FROM config_parameters where CONFIG_ID = 'DATABASE_TO_SHARE'), schemaname =>  (SELECT VALUE FROM config_parameters where CONFIG_ID = 'SCHEMA_TO_SHARE'), credit => (SELECT VALUE FROM config_parameters where CONFIG_ID = 'PROFILE_QUERY_CREDIT'), days => '14');
    
    
    /**
    Step-2  Create task using procedure (continuous polling data task)
    Run only if you have executed Steps-1 and share db to unravel.
    */
    CALL create_tasks_with_schedule((SELECT VALUE FROM config_parameters where CONFIG_ID = 'WAREHOUSE_NAME'),
    (SELECT VALUE FROM config_parameters where CONFIG_ID = 'REPLICATE_METADATA'),
    (SELECT VALUE FROM config_parameters where CONFIG_ID = 'REPLICATE_STORAGE_METADATA'),
    (SELECT VALUE FROM config_parameters where CONFIG_ID = 'REPLICATE_HISTORY_QUERY'),
    (SELECT VALUE FROM config_parameters where CONFIG_ID = 'REPLICATE_WAREHOUSE_AND_REALTIME_QUERY'),
    (SELECT VALUE FROM config_parameters where CONFIG_ID = 'CLEANUP_DATA_TASK'),
    (SELECT VALUE FROM config_parameters where CONFIG_ID = 'CREATE_SHARED_DB_METADATA'));
    
    /**
     Step-3 (START ALL THE TASKS)
     */
    ALTER TASK replicate_metadata RESUME;
    ALTER TASK replicate_storage_metadata RESUME;
    ALTER TASK replicate_history_query RESUME;
    ALTER TASK replicate_warehouse_and_realtime_query RESUME;
    ALTER TASK cleanup_data_task RESUME;
    ALTER TASK create_shared_db_metadata_task RESUME;
    
    /**
      Step-4 (Data sharing with Unravel account)
    */
    CALL SHARE_TO_ACCOUNT((SELECT VALUE FROM config_parameters where CONFIG_ID = 'ACCOUNT_ID'), (SELECT VALUE FROM config_parameters where CONFIG_ID = 'SHARE_NAME'), (SELECT VALUE FROM config_parameters where CONFIG_ID = 'DATABASE_TO_SHARE'), (SELECT VALUE FROM config_parameters where CONFIG_ID = 'SCHEMA_TO_SHARE'));