Skip to content

Kubernetes Container Storage Interface (CSI) Driver

Overview

CyVerse Discovery Environment uses Kubernetes to orchestrate interactive applications including Jupyter, RStudio, Remote Desktops, etc.

The Kubernetes Container Storage Interface (CSI) is a standardized interface that allows Kubernetes to interact with a wide variety of storage systems. By utilizing CSI, Kubernetes can dynamically provision, attach, mount, and manage storage volumes across different storage providers without being tightly coupled to any specific storage technology. This flexibility is crucial for ensuring that your containerized applications have reliable and consistent access to the storage they require, regardless of the underlying infrastructure.

Purpose in the Discovery Environment VICE Apps

In our workbench environment, the CSI driver plays a key role in managing data in the Data Store.

We have integrated the CSI driver to facilitate seamless access and management of user and community data stored in the Data Store within containerized applications.

The driver creates volume mounts at specific paths in our containers:

/data-store: A root of the volume mounts for accessing community shared and personal data in the Data Store. This directory contains following four sub-directories.

/data-store/input: A directory providing access to the input data specified when launching the app. If an input is not specified for the app, the directory will not be created. /data-store/output: A directory providing access to the output data for the app. The directory will be empty when the app starts. /data-store/iplant/home/<username>: A directory providing access to the user's home in the Data Store. /data-store/iplant/home/shared: A directory providing read-only access to the community shared data in the Data Store.

The driver also creates a symbolic link ~/data-store to the root of the volume mounts /data-store for convenient access in some apps, such as Jupyter Notebook.

Limitations of the CSI Driver

While the Kubernetes Container Storage Interface (CSI) driver provides a flexible and powerful way to manage storage in Kubernetes, it does have some limitations, especially when it comes to latency-sensitive operations.

Latency Concerns

The CSI driver introduces an abstraction layer between Kubernetes and the underlying storage system, which can add latency to storage operations. The latency becomes more significant when the underlying storage system is remote, such as the Data Store. This latency might not be noticeable for simple data retrieval tasks, but it can become significant in certain scenarios:

High-frequency File Operations: When dealing with operations that involve the creation, modification, or deletion of a large number of small files (e.g., tens of thousands or more), the additional overhead from the CSI layer and the data latency between the Discovery Environment and the Data Store can lead to performance bottlenecks.

Real-time Data Transfers: For applications that require low-latency, real-time data transfers, such as those using curl, wget, or similar tools, the latency introduced by the CSI driver might affect performance. The overhead of managing the connection between Kubernetes and the Data Store could lead to slower transfer speeds, especially for large-scale data transfers or when dealing with numerous small files.

When Not to Use CSI

Given these limitations, it may be more effective to avoid using the CSI driver in the following scenarios:

Real-time Data Transfer Tools: When using tools like curl and wget for downloading or uploading data, particularly in environments where speed and low-latency are critical, it may be better to use a directory other than /data-store or ~/data-store. This allows the tools to utilize local storage, which can offer low-latency than the Data Store.

Massive File Operations: For workloads that involve the transfer or manipulation of many thousands of files, especially if they are small, consider using a directory other than /data-store or ~/data-store. This allows the workloads to utilize local storage without the additional overhead of the CSI driver.

When to Use the CSI Driver

Despite some limitations, the Kubernetes Container Storage Interface (CSI) driver excels in many scenarios, particularly where flexibility, scalability, and ease of integration are important. Below are some examples of when using the CSI driver is highly advantageous:

Ideal Use Cases

Interactive Notebooks: For environments like Jupyter Notebooks, the CSI driver is an excellent choice. Notebooks often require seamless access to persistent storage for saving and retrieving notebooks, datasets, and outputs. The CSI driver provides the necessary abstraction to manage data in the Data Store dynamically, allowing for a smooth and efficient workflow.

Small to Medium Data Sets: When dealing with small to medium-sized datasets, the CSI driver is well-suited for the job. These datasets can be easily managed and accessed via the volume mount paths (/data-store and ~/data-store), providing users with consistent and reliable storage without the need for complex configurations.

Persistent Storage for Stateful Applications: Applications that require persistent storage -- such as databases, stateful services, or applications that need to maintain state between restarts -- benefit from the CSI driver’s ability to manage storage volumes effectively. The driver ensures that data remains consistent and available, even as containers are moved across nodes within the Kubernetes cluster.

Summary

The CSI driver offers Discovery Environment Vice Apps a robust interface for accessing user and community-shared data in the Data Store. It is designed to provide flexible and user-friendly access, particularly when data access needs are moderate. It is an ideal choice for interactive environments, development workflows, and applications that need reliable and persistent access to storage without the overhead and complexity associated with direct storage management.