Skip to main content
Loading

Glossary

The Aerospike schemaless data model gives application designers maximum flexibility. Aerospike uses the following terms to differentiate it from the relational database (RDBMS) world. In our documentation, we introduce Aerospike concepts with their corresponding common RDBMS terms.

Search the glossary:

A

ACID compliant

ACID compliance refers to database transaction characteristics of atomicity, consistency, isolation and durability (ACID). Atomicity refers to the entire database transaction – and not just one component – being reliable. Consistency means that all the data follows the appropriate data validation rules. Isolation means that data can simultaneously process multiple transactions without affecting anything else. Durability means that data is saved once a transaction is completed, even if there is system failure such as a power outage.

Complying with these principles ensures that database transactions are reliably processed.

Being ACID compliant is important because these four principles provide reliability, validity and correctness. They ensure there are no mistakes, such as network errors, disruptions or hardware failures. Organizations need ACID compliance because they need transactions to reliably succeed (or fail) for critical applications such as financial transactions or time-sensitive data. Industries that require ACID compliance include financial institutions, manufacturing operations, transportation, IoT environments and energy production.

analytical workload

An analytical workload is a broad collection of computing tasks that analyze a particular business process, market condition, user behavior, prediction, forecast, simulation, and myriad other use cases. Today’s analytical systems are designed to handle greater data volume, data complexity, unpredictability, velocity and variety. An analytical workload is created by users who are exploring data in real time, and delivers data and insights to a broad set of tools, dashboards, and data applications.

An analytical workload is valuable when users need to conduct analyses like response attribution, churn prediction and fraud detection. Its flexibility means that data can be looked at in various ways while still drawing correlations and predicting future possibilities from sophisticated models.

Today’s analytical workloads often analyze very large datasets. That’s important because more data than ever before is being collected by organizations, including user behavior, statistical analysis and IoT (Internet of Things) information.

API

An API (application programming interface) is a coded software interface used by programmers to accomplish computing tasks. Programmers use an API to interact with other products and services, often without having to know the details of how they’re implemented. An API doesn’t necessarily expose the internal details of how a system works, but it enables programmers to execute computing tasks in a repeatable, predictable way.

The advantage of an API is that it saves time and money because it doesn’t require programmers to write new code or spend time trying to figure out the nittygritty of an operation. An API also provides flexibility when managing existing tools or products or designing new ones. That can lead to greater innovation.

Another advantage of an API is that it improves collaboration because it enables platforms and apps to communicate without friction. This enables organizations to streamline and automate workflows, improve communication between various business units and boost performance through more efficient operations.

asd

This Aerospike database process (Aerospike Daemon) runs on a server or node.

B

batch operations

Batch operations are repeating computing tasks that can be kicked off and left unattended until they run to completion. The term batch operations arose when punched cards were used to tell computers what to do when performing more than one program. When multiple directions were needed, these cards were run in batches.

In the database world, batch operations refers to the processing of a large number of like tasks (batch reads and batch writes are most common) instead of processing each task separately. Batch updates also fall under batch operations. Batch updates are sets of multiple update statements that are submitted to the database for processing as a batch.

Batch operations usually save compute resources and time because executing a hundred (or a million) individual reads or writes usually takes much longer than executing those operations in a batch.

big data

Big data refers to the recent trend of managing and processing increasingly large structured and unstructured datasets that are available to businesses. Big Data has been characterized by the alliteration of the “4 Vs of Big Data” – volume, velocity, variety and veracity (sometimes a fifth V is added for “value.” Big data is also arriving from different sources (vehicles, wearables, appliances, artificial intelligence), making it a challenge for traditional relational databases to handle with low latency.

Big data is important because of how it can be used and the vast and growing collection of new, exciting use cases that it has inspired. Through analysis, the data can show how to improve business inefficiencies, predict user and market behaviors, or to create new revenue streams and markets. Businesses can use big data to figure out why a product or service failed, detect fraud early and recalculate risks. More and more data is used in machine learning and artificial intelligence applications, which in turn will drive further data growth.

Examples of big data can include social media analysis, the stock exchange simulations, and to analyze complex systems and machines such as jet engines, oil derricks, and traffic systems. The application of big data is nearly limitless in scope and potential.

bin

In the Aerospike database, each record (similar to a row in a relational database) stores data using one or more bins (like columns in a relational database). The major difference between bins and RDBMS columns is that you don't need to define a schema. Each record can have multiple bins. Bins accept these data types (which are also referred to as "particles" in documentation and messages about bins):

  • Boolean
  • Bytes
  • Double
  • Geospatial
  • HyperLogLog
  • Integer
  • List
  • Maps
  • String

For information about these data types and how bins support them, see "Scalar Data Types"

Although the bin for a given record or object must be typed, bins in different rows do not have to be the same type. There are some internal performance optimizations for single-bin namespaces.

C

C client

A C client refers to code or app that can access the popular and easy-to-use C programming language. This structured language is used widely to write different applications. C language often serves as a base language; once it’s learned other languages follow because they’re based on the concept of c.

C is often used in embedded systems, operating systems such as Microsoft Windows and in developing desktop applications and mobile phone operating systems. MySQL is the most popular database software that is built using C.

C# client

A C# (C sharp) client is a library or application that uses the C# programming language to build secure and robust applications that run on the .NET framework.

The C# programming language is object-oriented and general-purpose, intended for software components and appropriate for deployment in distributed environments. In the Aerospike system, it works well for applications for hosted and embedded systems, large and sophisticated or small with dedicated functions.

client

A library included by the user's application, which provides an API that allows the application to perform operations against the Aerospike database cluster. In our documentation, client, API, and application are used interchangeably. The client is written in a language such as Java, C, C#, Go, Python, Node.js, Ruby, Rust and others

cloud based database

A cloud based database is one that is built, deployed and accessed in a public, private or hybrid cloud. A cloud based database has similar functions to a traditional database, but offers greater flexibility with cloud computing.

Among the other benefits of a cloud based database is that users can host databases without buying dedicated hardware. It also enables the user or the provider to manage it and is easy to access through a web interface or an API provided by the vendor.

In addition, a cloud based database can also support relational and NoSQL databases. Its storage can be scaled up to handle a growing demand or be decommissioned quickly if a project is abandoned.

cloud managed services

Cloud managed services are a set of applications or utilities that are provided to end users typically via a web interface in the cloud. These services can provide a wide range of business or technical functions that hide the complexity of cloud platform management and control. This can include migration, maintenance and optimization.

Cloud managed services can help businesses achieve greater agility and efficiency with essential business processes without having to hire and train a technical team to keep those systems running.

Cloud managed services typically operate in public and hybrid cloud environments. One organization may decide to have their entire infrastructure in the cloud, while others only want their CRM solutions. Cloud managed services can take on various tasks, such as engineering on demand, operations management, continuous support, hosting and implementation.

The benefits of cloud managed services include lower infrastructure costs, elastic scalability, more predictable pricing, automatic upgrades, disaster recovery support, and enhanced availability and security.

cluster

Aerospike is a distributed database, made up of a collection of one or more database nodes, a cluster. The cluster acts together to distribute and replicate both data and traffic. Client applications use Aerospike APIs to interact with the cluster, rather than with individual nodes. This means that the application does not need to know cluster configuration. Data in the cluster is evenly distributed to ensure an even resource consumption on the nodes. As you add or remove nodes from the cluster, it dynamically adjusts without needing any application code or configuration changes.

cross-datacenter replication

Cross-datacenter replication (XDR) lets data be reproduced – or replicated – across clusters that can be located in different clouds and various data centers.

The replication is used to guard against data-center failure. It’s also used to supply high-performance access to globally distributed applications that are mission critical. Cross-datacenter replication guarantees continuous service because if one of the data centers has a problem, there is backup data in another center.

Once replications are established, they continuously replicate until paused or deleted.

The telecommunications industry relies on cross-datacenter replication because data availability, consistency, resilience and low latency are critical.

D

data intensive applications

Data intensive applications handle large quantities of data (multiple terabytes and petabytes) that can be complex and distributed across various locations. Data intensive applications process data in multistep analytical pipelines, including transformation and fusion stages.

Some examples of data intensive applications include stock trading applications, user behavior analysis, market simulations, and digital marketing. A stock trading application needs user account information access and also information about the market and portfolios. In digital marketing, there may be several campaigns running at one time, in addition to using demographic information to target specific ads to specific consumers.

When looking at data intensive applications, it’s important to consider the optimal methods of handling high volumes of different types of data, scalability, resilience and security.

data pipeline

In analytics, a data pipeline is a collections of systems that covers the entire data journey from extraction from the data sources, to ingest into a file system, database, or storage service, to the ETL systems that transform and prepare data for analysis, to the analytics data processing engines, and the output of data to dashboards, BI tools, and data applications.

Many organizations have several, even hundreds of data pipelines that service different lines of business or use cases. Effective design and implementation of data pipelines helps organizations gain better and more insights by effectively capturing, organizing, routing, processing and visualizing data.

As more data becomes available from more sources, creating effective data pipelines is essential to connecting and coordinating different data sources, storage layers, data processing systems, analytics tools and applications. Since data scientists, developers and business leaders may all want to work with data in different ways, a flexible data pipeline architecture is essential so that relevant details for each team can be gathered, stored and made available for whatever analysis is needed.

Design goals of an effective data pipeline architecture is that it’s scalable, flexible, cost-effective and optimized for a wide variety of analytical tasks.

data pipeline tools

Data pipeline tools are used to automate data extraction, cleaning and loading in order to make the process more efficient, reliable and secure. It makes ingestion from various data sources to a single destination easier and more consistent.

There are free data pipeline tools like FOSS (free and open-source software) that can be customized to fit specific use cases. However, it can be more difficult to scale FOSS, and there is a lack of technical support.

Data pipeline tools are important because they take massive amounts of raw data and transform it into data that is ready for analytics, data apps and machine learning systems. For example, data pipeline tools can be used to deliver sales data to sales and marketing as part of a customer 360 initiative, or recommend financial services to a small business owner.

data storage layer

A data storage layer is where your gathered data is stored and saved for when it is needed. There are four layers in data warehouse architecture: data source layer, data staging layer, data storage layer and data presentation layer. The data storage layer makes it easier to back up files to ensure they remain safe and can be recovered quickly if computer hackers strike or there is some sort of outage.

In the data storage layer, the data is cleaned, transformed and prepared with a specific structure. This enables access by those within a business who require the data for various reasons.

data synchronization

Data synchronization is required when two or more systems want to access and manipulate the same datasets with accuracy and consistency. Data synchronization can take place in memory in the case of a traditional relational database, or it may be required with datasets that are widely distributed – in different cities, regions, or data centers.

In order to achieve effective Data synchronization, a database/data platform must prepare and cleanse data, check for errors or duplication and then ensure consistency before it can be distributed, replicated, and synchronized. This is important because if synchronized data is changed by any replica, those updates must be reflected throughout the system to avoid errors, prevent fraud, protect private data and deliver accurate, up-to-date information and insights.

Data synchronization is becoming more vital as the population grows mobile and globalization continues. Data synchronization is also important with the growing accessibility to cloud-based data.

Some of the data synchronization methods include data replication in databases, file synchronization – typically used for home/cloud backups – and version control methods to synchronize files that might be changed by more than one user simultaneously. A distributed file system usually requires that devices be connected in order to sync multiple file versions. Mirror computing provides different sources with the same copy of the data set.

DBaaS

DBaaS (database as a service) is a cloud computing managed service that provides various database services without having to understand the underlying hardware, software, or database operations.

DBaaS providers host the database infrastructure and typically provide a web interface to add and query data, although they often also provide access to the data via standard tools or special APIs. These providers take care of scalability, resilience, restoration, security and maintenance. They often offer 24/7 support and geo-replication for availability and backups.

The benefits of DBaaS is that it’s simpler to deploy, and thus more immediate, and sometimes more cost effective. This can lead to faster and deployments for developers and businesses and provide greater agility to business operations. DBaaS can be an attractive option for small businesses and startups that do not own data centers or racks of computers.

demand-side platform

A demand-side platform is a marketing automation tool that helps mobile advertisers buy mobile, search and video ads from a marketplace where publishers list ad inventory. A demand-side platform provides a way for managing ads across various real-time bidding networks.

Demand-side platforms run independently of networks like Facebook or Instagram. As third-party software, they provide advertisers with one place to buy, analyze and manage advertising across many networks.

One of the advantages of using a demand-side platform is greater efficiency. Since advertisers only have to use one dashboard, more information is available than from a single network. Ads can also be better targeted using the available data, which can lead to higher conversion rates.

Demand-side platforms are considered an important tool to mobile marketing because they are automated and provide a way for campaigns to easily be set up and managed. In addition, campaign performance can be seen in real time, providing a way for advertisers to make changes as needed to gain the greatest benefit.

digest

The unique object identifier for a single record, digests are a hash of the record's set and userKey. Record keys are hashed using the RIPEMD-160 algorithm, which takes a key of any length and always returns a 20-byte digest. By default the record saves the digest but not the key, which saves storage for long keys. For example, using only the digest for a 200-byte key improves wire performance and storage by saving 180 bytes.

distributed SQL

Distributed SQL is the ability to query a single logical relational database across multiple servers (clusters) with standard SQL syntax. Distributed SQL databases have strong consistency across clusters, data centers, or other geographic/availability zones. Distributed SQL databases are important because they are capable of scaling out quickly by adding additional cluster nodes and can therefore handle very large datasets.

Distributed SQL is suited to use cases where dramatic surges and troughs of activity are common, such as ecommerce sites that experience large surges of activity during holidays, or betting sites that experience an avalanche of activity during large sporting events.

Distributed SQL databases provide ample headroom and enough capacity to handle such sudden high demand at optimal operating cost, scaling infrastructure back down after the big game or holiday. Other uses for distributed SQL include streaming media that requires large amounts of data to customize offerings for users. The flexibility offered by distributed SQL can help eliminate downtime and lead to cost savings when users can quickly scale up or down, depending on their needs.

E

edge data

Edge data is data that is created as a result of edge computing processes, which is done at or near the physical location of the user or the source of the data. Being at the edge has connotations of limited network bandwidth and being outside the perimeter of data centers and the cloud.

The benefit of placing computing services closer to locations such as bank branch offices – or even oil derricks – is that local users and analysts get immediate and more reliable services and insights. With an effective edge computing mechanism, data can be cleansed and transformed at the point of origin, thus reducing the amount of ETL work done by core systems.

As more organizations with remote locations are trying to handle growing data volumes, edge computing provides a way to apply storage and compute resources in the most efficient and cost effective manner.

F

five-nines uptime

Five-nines uptime – or 99.999% – refers to the amount of time a network or service is available to users or other systems over a certain period, usually a year. This means there will be about 5.26 minutes of total downtime, either planned or unplanned.

  • Six-nines (99.9999%) = .526 minutes
  • Five-nines (99.999%) = 5.26 minutes
  • Four-nines (99.99%) = 53 minutes
  • Three-nines (99.9%) = 8 Hours and 46 Minutes
  • Two-nines (99%) = 3 Days, 15 Hours, 36 Minutes

Five-nines uptime is achieved by adding redundancy, failover, fast restart/respawn of processes so that no single component or combination of component failures can crash the entire system. Steps are also taken to ensure the crossover between redundant systems doesn’t become a failure point. The availability is also enhanced when failures are detectable as they occur and reliance on staff is reduced in order to cut human error.

Five nines of uptime is becoming more critical for organizations that rely on high operational performance, such as hospitals or data centers. Practically speaking, five-nines and above uptime is considered “always on”.

G

Go client

Go client refers to a code or application that can access the programming language that was initially built for programs connected to networking and infrastructure. It was initially aimed at replacing Java and C++ programming languages, but is now used for various applications. It is popular for cloud-based or server-side apps and is favored by DevOps and site reliability automation.

Go is considered easy to learn and understand and has an active open-source community that develops libraries and lends support to other users.

Go is also a high performing programming language that can handle large-scale automation.

Infrastructure tools like Kubernetes, Docker and Prometheus are written in Go. In addition, cloud platforms provide ample support to those using Go.

H

high availability database

A high availability database is a database that is designed to operate with no interruptions in service, even if there are hardware outages or network problems. High availability databases often exceed even what’s stipulated in a service level agreement.

A high availability database ensures greater uptime by eliminating single points of failure, ensuring a reliable crossover between redundant systems and detecting failures right away, such as through environmental problems or hardware or software failures.

Typical high availability database features include server or node failover, hot standby, data replication and distributed microservice architecture.

Many businesses today have critical databases and applications, such as data warehouses and ecommerce applications that require high availability. High availability databases are important to reduce the risk of losing revenue or dissatisfied customers.

hotkey

A hotkey (also hot key or hot-key) is a specific key subjected to a large number of read/write operations in a short time window. This can occur when multiple clients or processes attempt to access or modify the same data element simultaneously, leading to a concentrated workload on a single node. When a server node receives too many concurrent requests for the same key, it may reject the request with a KEY_BUSY error. This also increments the fail_key_busy statistic for monitoring such scenarios.

hybrid storage

Hybrid storage is a storage strategy that blends the use of flash storage, solid state drives (SSDs) and mechanical disk drives in order to provide the optimal combination of cost and performance for a given set of workloads. A hybrid storage approach enables a myriad set of different applications and use cases to have the storage performance they need at the right price point provided by the hybrid storage platform.

One of the benefits of hybrid storage is that it enables organizations to leverage high performance storage – such as flash drives or SSDs – when it is needed. Organizations can determine whether data is hot, warm or cold and then choose the most appropriate storage medium for the application. This enables businesses to craft a plan about how data will be used and when to achieve the greatest impact and efficiency.

Hybrid storage can sometimes be implemented in a single storage system. This offers users a single point of accountability for hardware and software issues. This can be important when businesses are looking for greater efficiency when data volumes are increasing and storing everything on flash storage can be too expensive.

J

Java client

A Java client is a Java application written to execute in a Java Virtual Machine (JVM) on a client device – typically a desktop, mobile device, or other endpoint. Because it is designated as a client, it typically provides a user interface and connectivity to a backend service, often written for a JVM configured to run on servers.

Java clients – like Java itself – are ubiquitous and can be found as part of social media applications, kiosks, mobile devices, smart vehicles and more. Online banking uses Java clients to provide customers a way to easily handle their finances through their various mobile devices, laptops, or desktops.

The Java language and runtime (JVMs, Java clients) were designed to be portable (write once run everywhere) and can handle different device profiles – difference screen sizes, network protocols, etc. That portability enables rapid deployment of applications, with the data produced by the app consistently available online. Other benefits include easy integration into any app or website, availability to a wide range of users, customization and adaptability in accepting updates and changes.

K

key

A key uniquely identifies a single record in the namespace, similar to how a primary key in an RDBMS identifies a single record in a table. The key is the distinct (set, userKey) pair in a specified namespace. The userKey data type can be a string, integer or bytes (blob). For example in a namespace user_profiles, a specific user record can be identified by the key (eu-users, 'foo@gmail.com').

key value NoSQL database

A key value NoSQL (standing for “non-SQL” or “Not only SQL” and pronounced “no sequel”) database used to describe a new generation of non-relational databases that use a key-value method to store data as a collection of key-value pairs in order to get fast lookup results on very large datasets. In this way, the key becomes the unique identifier. A key value NOSQL database is considered the simplest type of NoSQL databases.

A key value NoSQL database offers rapid data storage (writes) and information retrieval (reads) because of its simple data structure and lack of a predefined schema. It also has a high performance because of common use of integrated caching features that enables users to store and retrieve data very quickly. Because of its relative architectural simplicity, a key value NoSQL database can or scale out quickly in cloud environments without causing operational disruptions.

L

low latency algorithmic trading

Low latency algorithmic trading is a process for carrying out orders using automated and pre-programmed trading directions to account for different prices, timing and volume. Faster execution is achieved through low latency, which delivers data under a millisecond in order to make faster decisions.

There are many factors that can impact the low latency of algorithmic trading, such as the distance between the exchange and the trading system and the efficiency of the trading system architecture. This architecture might include network adaptors, the operating system choice, code efficiency and programming language.

Algorithmic trading is mostly used by institutional investors and big brokerage firms to reduce the expense associated with trading.

M

migration

When nodes are added or removed from a database cluster, data migrates between the remaining nodes. Once migrations are complete, the data in the new cluster is once more evenly distributed.

multi-cloud environment

A multi-cloud environment is where more than one cloud computing is used. It might be a combination of public, private or edge clouds. These clouds may be used in various combinations in order to distribute applications and services.

A multi-cloud environment, for example, might be used to speed up the delivery or transformation of apps. Or, an enterprise might consider spreading across different clouds so that it’s not dependent on one vendor. Some applications, such as logistics, retail and manufacturing may need to be distributed at the edge to be physically closer to users and deliver faster results or better customer experiences.

multi-model database

A multi-model database is a database management system that meshes different kinds of database models into one integrated database engine. This provides a single back end database that can service a wider range of data processing and data retrieval tasks and use cases. This differs from most database management systems that are organized around a single data model (eg. relational, document, graph, et al) that decides how data is organized, stored and manipulated.

A multi-model database can accommodate object-oriented, key-value, relational, wide-column, document and graph models. Since multi-model databases typically don’t store all their data in tables like traditional relational databases, they can store structured, semi-structured and unstructured data types. This leads to consistency with no fragmentation.

A multi-model database can also do fundamental tasks like storing data, indexing and querying. Most multi-model databases are also ACID (atomicity, consistency, isolation, durability) compliant and have frictionless integration with most of the latest database models. Data can be integrated from various sources and in many formats.

N

namespace

Aerospike database clusters contain one or more namespaces, similar to a tablespace in an RDBMS. Namespaces segregate data with different storage requirements. For example, some data may have high performance/low storage requirements more suitable for RAM, while other data can be stored on SSD storage. The Aerospike schemaless data model allows you to mix data types within a namespace. You can store data on users and URLs in the same namespace, and separate them using sets.

NewSQL

NewSQL (pronounced new ess-cue-ell or new sequel) is a relational database management system (RDMS) that aims to provide NoSQL system scalability while also maintaining the consistency of a traditional database system.

NewSQL combines ACID (atomicity, consistency, isolation and durability) compliance with horizontal scaling for online transaction processing workloads. Enterprise systems that handle data, such as financial and order processing systems, are too big for a traditional relational database. At the same time, these enterprise systems aren’t practical for NoSQL systems because they have transactional and consistency requirements. NewSQL provides the scale and reliability without requiring more infrastructure or development expenditures.

NewSQL uses SQL to ingest new information, execute transaction processing at a large scale, and change the contents of the database. The main categories of NewSQL include new architectures, transparent sharding middleware, SQL engines and database as a service (DBaaS).

node

An Aerospike database cluster is made of one or more nodes. These are the individual servers that act together as a distributed database to fulfill client requests.

NoSQL .NET database

A NoSQL .NET (pronounced not “sequel” or not “ess que ell” dot net) database means the database is written in .NET, which is a no-cost, open-source cross platform for building different applications. .NET enables different languages, editors and libraries to build for mobile, web, games and the Internet of Things (IoT).

For example, .NET can be written in languages of C#, F# and visual basic. Whatever the chosen language, .NET will run natively on any compatible operating system. This enables many different types of apps to be built. .NET also has a set of base class libraries and APIs that are common to all .NET applications.

.NET is popular with software developers and was built by Microsoft for building many different types of applications.

NoSQL database design

NoSQL database design is focused on how an application will query the data, rather than concentrating on the relationships within the data.

NoSQL database design stresses access patterns over abstract data models. That’s why best practices for NoSQL database design call for a graph of the ways that applications will query the data, and the necessary workload support.

NoSQL database design also looks at how often the dataset will be changed, how much data will be stored and the requirements for availability, performance and consistency.

NoSQL database design means choosing the right type of database for a certain application. These database types can be key-value stores, wide-column stores, document databases and graph databases.

NoSQL document database

A NoSQL document database is a NoSQL database (standing for “non-SQL” or “not only SQL” and pronounced “no sequel”) that can store, retrieve, and manipulate document-oriented (or also known as semi-structured) information. Document databases are more efficient, intuitive, and flexible at handling this semi-structured data than relational models because relational databases must convert documents into relational tables (rows and columns) to store and manage them. Modern NoSQL document databases are also designed to scale out in server clusters and cloud infrastructure.

Instead of storing data in fixed rows and columns, document databases use flexible data models like JSON or JSON-like data structures. The semi-structured nature of document data means that every document object in the database can have a unique structure. This means that users can add new objects without changing the entire database. In addition, users can customize documents to have the same or different structures.

NoSQL graph database

A [NoSQL graph database](https://www.ontotext.com/knowledgehub/fundamentals/nosql-graph-database/#:~:text=The%20NoSQL%20('not%20only%20SQL,data%20and%20social%20media%20analytics.) (standing for “non-SQL” or “Not only SQL” and pronounced “no sequel”) is designed to handle huge sets of structured, semi-structured or unstructured data. A NoSQL graph database can integrate heterogeneous data from a variety of sources and make links between different datasets. It does this by focusing on the relationships between different entities and then surmising new knowledge from information on hand.

The NoSQL graph database is more flexible than a relational database, and also considered more dynamic and less expensive. Its ability to handle massive loads of unstructured data that can come from areas such as the Internet of Things (IoT) is also considered an advantage.

O

operational workload

Operational workload refers to an application’s ongoing work and what it is being asked to do. When considering an operational workload, issues that are considered include what data is being processed, how that data is processed and whether it is in a structured or unstructured environment.

Other considerations to determine an operational workload can include the data volume during a specific period, how much effort has to be put into it and the time it will take to repeat that effort. Operational workload will look at duty cycles and working set sizes to make these determinations.

Determining the correct operational workload is important in order to create a more effective design and operation, while also optimizing the workloads.

P

particle

Synonym for "data type" in documentation and messages referring to bins. For example, "Boolean particle" means "Boolean data type" in reference to bins.

PHP client

A PHP client is an application written in PHP, an open-source, general-purpose scripting language that can be used for a variety of projects on dynamic and interactive websites. PHP is often used as part of ecommerce and customer-relationship management (CRM) web applications.

PHP is one of the easiest programming languages to learn, with its forgiving syntax, plethora of resources and plenty of documentation. PHP also offers security with its data encryption and access restrictions.

PHP was one of the first server-side languages that could be embedded into HTML, making it easier to add functionality to web pages without having to call external files for data. It continues to have regular upgrades.

policy

Policies control the behavior of individual operations against the database, such as reading records, performing read and write operations on distinct data types within the record. They also dictate the operational behavior of a namespace or the entire database node or cluster.

Python client

A Python client refers to an application that is written in Python, a popular programming language. Python is popular because of the high level of abstraction available and the extensive library support. A Python client library is a piece of pre-written code also known as a sub-routine or module.

The Python language is considered intuitive to use. It’s easy to read, learn and write with a syntax similar to English. Because it’s simple to use, it improves productivity. It’s also cost-effective since it’s free and open-source.

Python is often used for cross-platform development because it is widely available. Because of its extensive library, it’s beneficial in developing interactive games, web applications, and machine learning systems.

Q

query languages

Query languages are programming languages for searching a database or dataset, changing its contents, or retrieving information. ANSI SQL is the best known and most widely used query language, but the Big Data revolution introduced many more specialized query languages – especially for NoSQL databases. While early query languages required database expertise to use, the interfaces have evolved and made it possible for anyone to access database information.

The main types of query modes are the menu (choose from a prescribed list), the fill-in-the-blank technique (use keywords in the search feature) and the structured query. The structured query is often used with relational databases and has a formal syntax that is considered a programming language.

Another of the query languages is natural language, which is seen as the most flexible and is allowed in some commercial database management software. This natural query language looks for action words and synonyms, and identifies the names of file, records and fields.

R

real time data analytics platforms

Real time data analytics platforms give organizations a way to use their real-time data by enabling extraction of valuable information and trends. Real time data analytics platforms provide better analytics and visualization by connecting data sources.

By measuring data in real time, businesses can make decisions based on the latest information. Because real time analytics was once time-consuming and expensive, it was used in only the most mission-critical cases. But now, the growth of real-time data from all kinds of connected devices and the use of the cloud means real-time analytics is much more accessible.

Real-time analytics platforms are serving a wide variety of industries. For example, the logistics industry can use it to track shipments and optimize routes. More organizations need data faster to enable better predictions in order to stay competitive in a hyper-connected and more competitive world.

real time data management

Real time data management entails real-time processing that handles workloads that continually fluctuate. For example, a stock market requires real time database management because it’s always changing. This is different from a traditional database with persistent data that isn’t usually affected by time.

Real time data management is also used in other industries such as banking, law, medical records, multimedia, accounting, reservation systems and scientific data analysis. These databases require speed so that data can be processed, results provided and immediate action taken. For example, an airport radar system needs data to be immediately processed so that it’s clear in real time where various airplanes are located.

real time database

A real-time database is a system using real-time processing to handle ever-changing workloads. Real-time databases are traditional databases that are used in fields such as banking, law, medical records, multimedia and science. A stock market is an example of a real-time database because it is dynamic and changes rapidly.

The term real-time database applies to databases that handle data streaming in real time, including in-memory data grids, in-memory databases, NewSQL databases, NoSQL databases and time-series databases.

One of the benefits of a real-time database is being able to store data that enriches streaming data. A real-time database also enables continuous queries to process ongoing events from people, apps and machines. Instead of the data growing stale, it can be used immediately.

real time engine

A real time engine enables you to easily move through computer environments. It is often associated with gaming or virtual reality environments.

A real-time engine must have extremely high throughput with low latency and be able to scale out easily. Real time computing requires a guaranteed response – often under a millisecond. For example, a real time engine may be needed to deliver results during a live online gaming event and be able to handle a surge in traffic during peak hours or special promotions.

Other uses requiring a real time engine can include safety-critical applications such as anti-lock brakes, which require the proper mechanical response in real time.

real time web applications

Real-time web applications are apps that enable interactive usage by users, systems, or applications. They operate within a time frame of under a second, or even a millisecond, enabling users to get information as soon as they ask for it. This means that users do not have to check on the information themselves or rely on software to check periodically for updates.

Real-time web applications can be things like instant messaging, gaming, status updates, alerts, and dashboards. The term real-time is often debated

record/object

A record (or object) is similar to a row in an RDBMS. It is a contiguous storage unit for all the data uniquely identified by a single key. A record is subdivided into bins (like columns in an RDBMS)

Ruby client

A Ruby client refers to applications that use the Ruby dynamically typed programming language that is used with modern APIs and single page apps. It is considered less popular than Python or Java for these purposes.

Ruby is easy to use and has an open source community, which makes it cost effective for many. Ruby is considered very flexible, so it’s in high demand for web development, scripting, data processing, DevOps and static site generation. It’s also a highly portable and cross-platform language. For example, code that is written on one operating system will run on others such as Linux, Max OSX and Windows.

S

server footprint

A server footprint is the amount of space – either physically or online – that computer hardware or software occupies. This might entail equipment such as servers, switchers, routers and storage in a facility. Software might include how much memory is required to run a program.

Server footprints are increasing with digital transformation and more devices supplying data or demanding online connections, such as streaming channels, online banking or smart vehicles. Some organizations are looking to reduce their server footprint to lesson the impact on the environment and save money.

set

A set is similar to a table in an RDBMS, except that you don't have to define a schema. A set isn't a distinct storage unit, but instead it is a collection of records within a namespace (the namespace does have its own dedicated storage).

SQL distribution

SQL (pronounced “sequel” or “ess que ell”) distribution means a single logical database is deployed on a cluster of servers in one or more data centers. The SQL distribution is known for strong consistency, high availability, resiliency and distributed use of data across different geographic environments.

SQL distribution provides a seamless developer and customer experience. For example, developers don’t have to worry about ACID (atomicity, consistency, isolation, durability) compliance or complex joins. Users can rely on better performance and scalability as data grows.

SQL distribution does have a strict schema and structured data.

SQL server database as a service

SQL (pronounced “ess que ell” or “sequel”) server database as a service (DBaaS) enables customers to use SQL servers without buying additional hardware or having to make complicated database deployments. DBaaS uses a simple model to deploy necessary applications.

supply-side platform

A supply-side platform is an advertising technology platform used by publishers to manage, sell and optimize their ad space (or inventory), via websites and mobile applications. These ads can be video or display ads.

Supply-side platforms connect directly to ad networks, data-management platforms, demand-side platforms and ad exchanges to sell ad inventory for websites and app owners.

Supply-side platforms are beneficial for publishers who may be managing complex and volatile programmatic ad purchases with different ad networks at one time. These platforms help ensure the various requirements and limitations for those ad networks are met.

Supply-side platforms may use advanced algorithms to predict which network provides the most effective results during a certain period.

T

time based graph database

A time based graph database (or time series database) is one that is built specifically for handling metrics, events or measurements that are time-stamped. It stores nodes and relationships instead of tables or documents.

Graph databases are often used in fraud detection and recommendation engines. A graph database helps determine relationships between potential purchasers, personal information such as an email address and what purchases the user is making similar to others with common interests.

Further information can be gathered from the time series, which can track measurements or events that are tracked and aggregated. Examples include clicks, trades in a market or application performance modeling. While financial data was one of the initial uses of a time series database, the focus has grown with sensors being included in everything from cars to microwaves to phones.

transactional workload

A transactional workload means that over time, the database is getting requests for data and various changes to that data from different users. The modifications that are made are known as transactions.

For example, a transactional workload is built to aid in transactions such as in banking or accounting systems. Relational databases such as MySQL were designed to handle transactional workloads. They can scale as needed, ensure transactional consistency and have quick, responsive queries.

U

UDF

A User-Defined Function (UDF) is code written by a developer that runs inside the Aerospike database server. UDFs can significantly extend the capability of the Aerospike Database engine in functionality and in performance. Aerospike currently only supports Lua as a UDF language.