Skip to main content

Home

Configuring FSImage (4.7.0.1 onwards)

Important

The FSImage is applicable only for CDH, CDP, and HDP platforms.

In Hadoop, the FSImage is stored on the OS file system. This file contains the complete directory structure (namespace) of the HDFS, details about the data location, and information about which blocks are stored on which node.

FSImage is configured in Unravel for some of the Data page features and content, specifically to:

Note

The FSImage status is enabled by default. To disable the feature, see Disable FSImage status.

The etl_fsimage task processes the FSImage for each of the connected clusters. FSImage processing involves file report generation and table size extraction. The duration of the task depends on the size of the FSImage. The etl_fsimage task imports the latest FSImage from Namenode. The etl_fsimage run time is proportional to the image size, for example:

Caution

FSImage is a snapshot that becomes outdated with time. The older the image, the more it diverges from the real-time structure.

In Unravel, you can configure FSImage for a single cluster environment or a multi-cluster environment. This topic includes the following sections:

Important

FSImage is processed by the Unravel ondemand process every day at 00:00 UTC. The latest FSImage should be uploaded to the Unravel core node a short time before 00:00 UTC to guarantee data freshness.

Set cores and memory to process FSImage

You can set Unravel properties to define the resources to process the FSImage. Run the following steps to define the resources.

Note

In a multi-cluster environment, you must perform the following steps on the core node.

  1. Stop Unravel

    <Unravel installation directory>/unravel/manager stop
    
  2. For FSImage processing, a standalone Spark process is used. This process runs with a default of 4 cores and 16 GB memory, suitable for a small-sized FSImage file of less than 10 GB.

    To support larger FSImage files, set the configuration as follows:

    <Unravel installation directory>/unravel/manager config ondemand fsimage resource <cores> <memory>
    
    ##For example:
    /opt/unravel/manager config ondemand fsimage resource 4 10g
    
  3. Apply the changes.

    <Unravel installation directory>/unravel/manager config apply
    
  4. Start Unravel

    <Unravel installation directory>/unravel/manager start
Configure FSImage in a single cluster environment

In a single cluster environment, FSImage is configured differently based on whether you can access the FSImage with hdfs dfsadmin permissions.

Configure FSImage with hdfs dfsadmin permissions

In a single cluster environment, if you are an Unravel user with hdfs dfsadmin privileges, you can run the following steps to download and configure the FSImage:

  1. Stop Unravel.

    <Unravel installation directory>/unravel/manager stop
    
  2. Download the FSImage.

    <Unravel installation directory>/unravel/manager config ondemand fsimage enable --automatic-fetch
  3. Apply the changes.

    <Unravel installation directory>/unravel/manager config apply
    
  4. Start Unravel.

    <Unravel installation directory>/unravel/manager start
  5. Run the following command to trigger the FSImage import.

    curl -v http://localhost:5000/small-files-etl
Configure FSImage without hdfs dfsadmin permissions

In a single cluster environment, if you are an Unravel user without hdfs dfsadmin permissions, then any other user with the hdfs dfsadmin permissions can manually fetch, parse, and upload the FSImage. Later, you (Unravel user without hdfs dfsadmin privileges) can download and configure the FSImage.

As a user with hdfs dfsadmin permissions, run the following commands to fetch the raw FSImage from the HDFS Namenode and parse it into a tab-separated text file.

hdfs dfsadmin -fetchImage <path to fsimage file on local machine>
hdfs oiv <path to fsimage file on local machine>

Ensure to download the FSImage for Unravel usage to /opt/unravel/tmp/ondemand_fsimage directory. This is the default directory.

Note

Unravel recommends not changing the default directory unless there are any space constraints. In such a case, you can change the default location as follows:

<Unravel installation directory>/unravel/manager config ondemand fsimage location <location to download FSImage> 

##For example: 
/opt/unravel/manager config ondemand fsimage location /tmp/ondemand

If you provide a different directory to add the latest FSImage, ensure that the Unravel user has the read permissions to that directory.

As an Unravel user, do the following to configure FSImage:

  1. Stop Unravel.

    <Unravel installation directory>/unravel/manager stop
    
  2. Enable and fetch FSImage.

    <Unravel installation directory>/unravel/manager config ondemand fsimage enable
    
    For example: 
    /opt/unravel/manager config ondemand fsimage enable
    
  3. Apply the changes.

    <Unravel installation directory>/unravel/manager config apply
    
  4. Start Unravel.

    <Unravel installation directory>/unravel/manager start
  5. Run the following command to trigger the FSImage import.

    curl -v http://localhost:5000/small-files-etl
Configure FSImage in a multi-cluster deployment

This section provides instructions to configure FSImage in a multi-cluster deployment for Unravel version 4.7.0.1 onwards. In the multi-cluster environment, the following are applicable:

  • Only a user with hdfs dfsadmin permissions can fetch, parse and upload the FSImage. Such a user can be an Unravel user or any other user.

  • You should use rsync to upload the FSImage from the Unravel edge node to the Unravel core node.

    On the Unravel core node add the required permissions associated with rsync (Adding the Unravel edge node as a well-known SSH host, adding the public RSA key of the user who uploads and runs the cron job etc.) to the authorized SSH keys.

    Execute the following steps on the edge node to authorize SSH keys on Unravel core node:

    1. Add the public SSH key of the user to the Unravel core node user's $HOME/.ssh/authorized_keys file.

    2. Add the Unravel edge node hostname as a known_host to Unravel core node.

    3. Run the following commands for SSH passwordless login for rsync command execution. You can skip the step to generate the keys if you already have the public keys.

       ssh-keygen -t rsa (##Skip this step, if you already have the public keys.)
       ssh <UNRAVEL_CORE_NODE_USER>@<UNRAVEL_CORE_NODE_HOSTNAME> mkdir -p .ssh
       cat ~/.ssh/id_rsa.pub | ssh <UNRAVEL_CORE_NODE_USER>@<UNRAVEL_CORE_NODE_HOSTNAME> 'cat >> ~/.ssh/authorized_keys'
       ssh <UNRAVEL_CORE_NODE_USER>@<UNRAVEL_CORE_NODE_HOSTNAME> "chmod 700 ~/.ssh; chmod 640 ~/.ssh/authorized_keys"

To configure FSImage in a multi-cluster environment, do the following:

  1. Run the following on the core node:

    1. Stop Unravel.

      <Unravel installation directory>/unravel/manager stop
      
    2. Enable FSImage configuration.

      <Unravel installation directory>/unravel/manager config ondemand fsimage enable
      

      /opt/unravel/tmp/ondemand_fsimage is the default location where the FSImage is added.

      Note

      Unravel recommends not to change the default location unless there are any space constraints. In such a case, you can change the default location as follows:

      <Unravel installation directory>/unravel/manager config ondemand fsimage location <location to download FSImage> 
      
      ##For example: 
      /opt/unravel/manager config ondemand fsimage location /tmp/ondemand
      

      If you provide a different directory to add the latest FSImage, ensure that the Unravel user has the read permissions to that directory.

    3. Apply the changes.

      <Unravel installation directory>/unravel/manager config apply
      
    4. Start Unravel.

      <Unravel installation directory>/unravel/manager start
  2. Run the following steps on each of the edge nodes:

    In case you do not have the hdfs dfsadmin permissions, then any other user with the hdfs dfsadmin permissions can manually fetch, parse, and upload the FSImage. Later, you (Unravel user without hdfs dfsadmin privileges) can download and configure the FSImage.

    As a user with hdfs dfsadmin permissions do the following to fetch and parse the FSImage:

    hdfs dfsadmin -fetchImage <path to fsimage file on local machine>
    hdfs oiv <path to fsimage file on local machine>

    Note

    If it is a Kerberos enabled cluster, run the following command to set the Kerberos authentication for the user with the hdfs dfsadmin permissions:

    <Unravel installation directory>/unravel/manager config ondemand fsimage kerberos /path/to/keytab user@REALM
    

    The FSImage, which is fetched externally should be placed at /opt/unravel/tmp/ondemand_fsimage on the core node. This is the default location.

    As an Unravel user, do the following to configure FSImage:

    1. Stop Unravel.

      <Unravel installation directory>/unravel/manager stop
      
    2. Ensure to set the SSH passwordless login for rsync command execution

    3. Run the following command to upload the FSImage to the location set on the core node:

      <Unravel installation directory>/unravel/manager run ondemand fsimage fetch --upload-to-core

      Note

      In case you have changed the default location on the core node (See step 1b above ), then run the following command to connect to the changed location and upload the FSImage.

      <Unravel installation directory>/unravel/manager config ondemand fsimage location --remote <FSImage/location/configured/on/core/node>
      
      For example:
      <Unravel installation directory>/unravel/manager config ondemand fsimage location --remote /opt/unravel/data/tmp/reports/fsimage
      
    4. Apply the changes.

      <Unravel installation directory>/unravel/manager config apply
      
    5. Start Unravel.

      <Unravel installation directory>/unravel/manager start
  3. On the core node, trigger the FSImage import.

    curl -v http://localhost:5000/small-files-etl
Configure for FSImage download by external users

You can configure for FSImage download by external users for both single cluster and multi-cluster as follows:

  1. Stop Unravel.

    <Unravel installation directory>/unravel/manager stop
    
  2. Enable the ondemand FSImage download.

    <Unravel installation directory>/unravel/manager config ondemand fsimage enable
    
  3. Optionally, you can change the default location for downloading FSImage. /opt/unravel/data/tmp/reports/fsimage is the default location where the FSImage is downloaded.

    <Unravel installation directory>/unravel/manager config ondemand fsimage location <location to download FSImage> 
    ##For example:
    /opt/unravel/manager config ondemand fsimage location /tmp/reports/fsimage
  4. Apply the changes.

    <Unravel installation directory>/unravel/manager config apply
    
  5. Start Unravel.

    <Unravel installation directory>/unravel/manager start
  6. Run the following command to trigger the FSImage import.

    curl -v http://localhost:5000/small-files-etl
Create Cron job to upload FSImage

You can create a cron to upload the FSImage to the Unravel server. The time to upload depends on the size of FSImage and the network bandwidth. You must assess this time to determine how often to run the cron job and configure it accordingly.

FSImage is processed by the Unravel ondemand process every day at 00:00 UTC. The latest FSimage should be uploaded a short time before 00:00 UTC to guarantee data freshness. Before uploading the latest FSimage, observe the total time taken to run the script and accordingly set the cron job so that Unravel has access to the fresh FSImage before 00.00 UTC.

Verify the FSImage configuration

After you have successfully fetched the FSImage, go to the UI and verify the FSImage configuration.

Important

Table worker daemon checks for table sizes every 24 hours by default. So even if FSImage is run, it would take that much time to reflect the size. To short-circuit, you can restart the table_worker daemon.

Tip

  • The relevant log file is <unravel-installation-directory>/logs/ondemand_tasks.out

  • Run one of the following commands to display the progress of the etl_fsimage task.

    egrep 'ETL_FSIMAGE|FSIMAGE_REPORTS_UTILS' ondemand_tasks.out
    grep etl_fsimage\(\) unravel_ondemand.out
  • Run one of the following commands to display the progress of the run_small_files, which is started whenever the Small Files Report is triggered from UI.

    egrep 'SMALL_FILES_REPORT|FSIMAGE_REPORTS_UTILS' ondemand_tasks.out
    grep run_small_files\(\) ondemand_tasks.out
Disable FSImage status

By default, the FSImage status is enabled. If you want to disable FSImage, perform the following steps.

Note

In a multi-cluster environment, you must perform the following steps on the core node.

  1. Stop Unravel

    <Unravel installation directory>/unravel/manager stop
    
  2. Change the setting.

    <Unravel installation directory>/unravel/manager config ondemand fsimage disable
    
  3. Apply the changes.

    <Unravel installation directory>/unravel/manager config apply
    
  4. Start Unravel

    <Unravel installation directory>/unravel/manager start