Glossary & Acronyms¶
This glossary is to help you become more familiar with terms used in the CyVerse ecosystem as well as more broadly in Open Science and beyond.
A¶
action: automate a workflow in the context of CI/CD, see GitHub Actions
agile: development methodology for organizing a team to complete tasks organized over short periods called ‘sprints’
allocation: portion of a resource assigned to a particular recipient, typical unit is a core or node hour
Anaconda: open source data science platform. Anaconda.com
application: also called an ‘app’, a software designed to help the user to perform specific task
awesome: a curated set of lists that provide insight into awesome software projects on GitHub
AVU: Attribute-Value-Unit a components for iRODS metadata.
B¶
beta: \(\beta\), a software version which is not yet ready for publication but is being tested
bash: Bash is the GNU Project’s shell, the Bourne-Again Shell
biocontainer: a community-driven project that provides the infrastructure and basic guidelines to create, manage and distribute bioinformatics packages (e.g conda) and containers (e.g docker, singularity)
bioconda: a channel for the conda package manager specializing in bioinformatics software
C¶
CLI: (1) the UNIX shell command line interface, most typically BASH (2) the CyVerse Learning Institute
command: a set of instructions sent to the computer, typically in a typed interface
conda: an installation type of the Anaconda data science platform. Command line application for managing packages and environments
container: virtualization of an operating system run within an isolated user space
Continuous Integration: (CI) is testing automation to check that the application is not broken whenever new commits are integrated into the main branch
Continuous Delivery: (CD) is an extension of ‘continuous integration’ to make sure that you can release new changes in a sustainable way
Continuous Deployment: a step further than ‘continuous delivery’, every change that passes all stages of your production pipeline is released
Continuous Development: a process for iterative software development and is an umbrella over several other processes including ‘continuous integration’, ‘continuous testing’, ‘continuous delivery’ and ‘continuous deployment’
Continuous Testing: a process of testing and automating software development.
CRAN: The Comprehensive R Archive Network
CyVerse tool: Software program that is integrated into the back end of the DE for use in DE apps
CyVerse app: graphic interface of a tool made available for use in the DE
D¶
Debian: a free OS, base of other Linux distributions such as Ubuntu
Development: the environment on your computer where you write code
DevOps Software *Dev*elopment and information techology *Op*erations techniques for shortening the time to change software in relation to CI/CD
Discovery Environment (DE): a data science workbench for running executable, interactive, and high throughput applications in CyVerse DE
distribution: abbreviated as ‘distro’, an operating system made from a software collection based upon the Linux kernel
Docker: Docker is an open source software platform to create, deploy and manage virtualized application containers on a common operating system (OS), with an ecosystem of allied tools. A program that runs and handles life-cycle of containers and images
DockerHub: an official registry of docker containers, operated by Docker. DockerHub
DOI: a digital object identifier. A persistant identifier number, managed by the doi.org
Dockerfile: a text document that contains all the commands you would normally execute manually in order to build a Docker image. Docker can build images automatically by reading the instructions from a Dockerfile
E¶
environment: software that includes operating system, database system, specific tools for analysis
entrypoint: In a Dockerfile, an ENTRYPOINT is an optional definition for the first part of the command to be run
F¶
FOSS: (1) Free and Open Source Software, (2) Foundational Open Science Skills - this class!
function: a named section of a program that performs a specific task
G¶
git: a version control system software
gitter: a Github based messaging service that uses markdown gitter.im
GitHub: a website for hosting
git
repositories – owned by Microsoft GitHubGitLab: a website for hosting
git
repositories GitLabGitOps: using
git
framework as a means of deploying infrastructure on cloud using KubernetesGPU: graphic processing unit
GUI: graphical user interface
H¶
hack: a quick job that produces what is needed, but not well
HPC: high performance computer, for large syncronous computation
HTC: high throughput computer, for many parallel tasks
I¶
IaaS: Infrastructure as a Service. online services that provide APIs
iCommands: command line application for accessing iRODS Data Store
IDE: integrated development environment, typically a graphical interface for working with code language or packages
instance: a single virtul machine
image: self-contained, read-only ‘snapshot’ of your applications and packages, with all their dependencies
iRODS: an open source integrated Rule-Oriented Data Management System, iRODS.org
J¶
Java: programming language, class-based, object-oriented
JavaScript: programming language
JSON: Java Script Object Notation, data interchange format that uses human-readable text
Jupyter(Hub,Lab,Notebooks): an IDE, originally the iPythonNotebook, operates in the browser Project Jupyter
K¶
kernel: central component of most operating systems (OS)
Kubernetes: an open source container orchestration platform created by Google Kubernetes is often referred to as
K8s
L¶
lib: a UNIX library
linux: open source Unix-like operating system
M¶
makefile: a file containing a set of directives used by a make build automation tool
markdown: a lightweight markup language with plain text formatting syntax
metadata:: data about data, useful for searching and querying
multi-thread: a process which runs on more than one CPU or GPU core at the same time
master node: responsible for deciding what runs on all of the cluster’s nodes. Can include scheduling workloads, like containerized applications, and managing the workloads’ lifecycle, scaling, and upgrades. The master also manages network and storage resources for those workloads
Mac OS X: Apple’s popular desktop OS
N¶
node: a computer, typically 1 or 2 core (with many threads) server in a cloud or HPC center
O¶
ontology: formal naming and structural hierarchy used to describe data, also called a knowledge graph
organization: a group, in the context of GitHub a place where developers contribute code to repositories
Operating System (OS): software that manages computer hardware, software resources, and provides common services for computer programs
Open Science Grid (OSG): national, distributed computing partnership for data-intensive research opensciencegrid.org
ORCID: Open Researcher and Contributor ID (ORCiD), a persistent digital identifier that distinguishes you from every other researcher
P¶
PaaS: Platform as a Service run and manage applications in cloud without complexity of developing it yourself
package: an app designed for a particular langauge
package manager: a collection of software tools that automates the process of installing, upgrading, configuring, and removing computer programs for a computer’s operating system in a consistent manner
Production: environment where users access the final code after all of the updates and testing
Python: interpreted, high-level, general-purpose programming language Python.org
R¶
R: data science programming language R Project
recipe file: a file with installation scripts used for building software such as containers, e.g. Dockerfile
registry: a storage and content delivery system, such as that used by Docker
remote desktop: a VM with a graphic user interface accessed via a browser
repo(sitory): a directory structure for hosting code and data
RST: ReStructuredText, a markdown type file
ReadTheDocs: a web service for rendering documentation (that this website uses) readthedocs.org and readthedocs.com
root: the administrative user on a linux kernel - use your powers wisely
S¶
SaaS: Software as a Service web based platform for using software
schema: a metadata standard for labeling, tagging or coding for recording & cataloging information or structuring descriptive records. see schema.org
scrum: daily set of tasks and evalautions as part of a sprint.
shell: is a command line interface program that runs other programs (may be complex, technical programs or very simple programs such as making a directory). These simple, stand-alone programs are called commands
Singularity: a container software, used widely on HPC, created by SyLabs
SLACK: Searchable Log of All Conversation and Knowledge, a team communication tool slack.com
sprint: set period of time during which specific work has to be completed and made ready for review
Singularity def file: (definition file) recipe for building a Singualrity container
Stage: environment that is as similar to the production environment as can be for final testing
T¶
tar: software utility for collecting many files into one archive file, often referred to as a tarball
tensor: algebraic object that describes a linear mapping from one set of algebraic objects to another
terminal: a windowed emulator for directly enterinc commands to a computer
thread: a CPU process or a series of linked messages in a discussion board
tool: In the context of CyVerse Discovery Environment, a Docker Container
TPU: tensor processing unit
Travis: Travis-CI, a continuous integration software
U¶
Ubuntu: most popular Linux OS distribution, based on Debian
UNIX: operating system
user: the profile under which applications are started and run,
root
is the most powerful system administrator
V¶
VICE: Visual Interactive Computing Environment - Cyverse Data Science Workbench
virtual machine: is a software computer that, like a physical computer, runs an operating system and applications
W¶
waterfall: software development broken into linear sequential phases, similar to a Gantt chart
webGL: JavaScript API for rendering interactive 2D and 3D graphics within any compatible web browser without the use of plug-ins
Windows: Microsoft’s most popular desktop OS
workspace: (vs. repo)
worker node: A cluster typically has one or more nodes, which are the worker machines that run your containerized applications and other workloads. Each node is managed from the master, which receives updates on each node’s self-reported status.
X¶
XML: Extensible Markup Language, data interchange format that uses human-readable text
Y¶
YAML: YAML Ain’t Markup Language, data interchange format that uses human-readable text
Z¶
ZenHub: team collaboration solution built directly into GitHub that uses kanban style boards
Zenodo: general-purpose open-access repository developed under the European OpenAIRE program and operated by CERN
zip: a compressed file format
zsh: Z-Shell, now the default shell on new Mac OS X
Fix or improve this documentation:
On Github: |Github Repo Link|
Send feedback: Learning@CyVerse.org