What to look for in a database for spatio-temporal analysis?
As sensor data, log data and streaming data continues to grow, we are seeing a significant increase in demand for tools that can do spatial and time-series analytics.
The cost of sensors and devices capable of broadcasting their longitude and latitude as they move through time and space is falling rapidly with commensurate proliferation. By 2025, projections suggest 40% of all connected IoT devices will be capable of sharing their location, up from 10% in 2020. Spatial thinking is helping innovators optimize existing operations and drive long promised digital transformations in smart cities, connected cars, transparent supply chains, proximity marketing, new energy management techniques, and more.
But traditional analytic databases are not well-suited to dealing with spatial and time-series data, which can be complex and difficult to analyze. As a result, there is rising interest in new types of analytic databases that are specifically designed to handle this type of data. Such systems need advanced algorithms and specialized data structures to efficiently store and analyze spatial and time-series data, allowing businesses to gain valuable insights and make better-informed decisions.
Spatial data, also known as geospatial data, refers to data that has a geographic component, such as the location of a physical object or the shape of a geographical feature. Spatial data can be represented in many different forms, including as coordinates, points, lines, polygons, and raster images. This data can be collected using a variety of methods, such as through global positioning systems (GPS), remote sensing, and aerial or satellite imagery. Spatial data is often better analyzed with specialized databases.
Temporal data, also known as time-series data, refers to data that changes over time. This type of data is often used to track changes or trends and can be collected at regular intervals or at specific points in time. Examples include financial data, weather data, and mechanical readings such as vibrations or temperature. This data too has typically been stored and managed in specialized time-series databases.
A spatio-temporal database brings together features of both spatial and temporal databases to create a more powerful analytics framework for this type of workload. It can store and analyze data that changes over both time and space. This type of database is ideal for applications such as tracking the movement of objects, monitoring the change of geographic features, and analyzing the spread of disease. It provides a way to store and query data that is constantly changing, as well as the ability to display it in real-time. Spatio-temporal databases are being deployed by innovators in telecommunications, logistics, defense, financial services, energy, transportation, retail, and healthcare.
While spatial and time-series functions have been “features” in conventional analytic databases for years, they have failed to produce breakthrough results due to performance and scale limitations. Spatial and temporal joins are particularly taxing on even the most advanced distributed, columnar, memory-first, cloud databases. Unlike traditional primary and foreign key joins (e.g., customer_id in table one joined to customer_id in table two), a spatial join may include mapping a longitude and latitude in one table to a polygon in table two. Just as the big data revolution was fueled by web 2.0 data and a rethinking of the systems used to store and analyze it, new technology in the form of vectorized databases have emerged to satisfy the unique requirements of spatio-temporal analytics.
Vectorized databases use vectorized query execution to boost performance. In contrast, conventional analytics databases typically process data on a row-by-row basis, which can be slower and require more computational resources. In a vectorized query engine, data is stored in fixed-size blocks called vectors, and query operations are performed on these vectors in parallel, rather than on individual data elements. This allows the query engine to process multiple data elements simultaneously, resulting in faster query execution and improved performance.
Vectorized databases use the latest advances in GPUs and vectorized CPUs from Intel, and software to process data in large blocks, allowing them to execute queries more quickly and efficiently. This can be particularly beneficial for complex queries and spatio-temporal joins that involve large amounts of data, as it can reduce the amount of time and resources required to execute the query. Overall, vectorized databases offer improved performance and scalability compared to conventional distributed analytic databases.
Kinetica surpassed all other systems in a recent benchmark for time-series and spatial databases, as well as on TPC-DS benchmarks for real-time analytics. Last year, Intel’s Jeremy Rader, GM, Enterprise Strategy & Solutions for the Data Platforms Group proclaimed, “Kinetica’s fully-vectorized database (sic) significantly outperforms traditional cloud databases for big data analytics.”
Kinetica is available as software or as-a-service in the cloud. Try Kinetica in the Cloud for Free