Bolt is a data management system for emerging class of applications that helps IoT devices to interact and store data. The unique requirements of these applications such as support for time-series and tagged data, ability to share data between devices and assurance on data confidentiality & integrity have made the older platforms unsuitable. These platforms such as HomeOS, MiCasa Verde and so on provide high-level abstraction mainly for devices to interact and not for storage. The following paragraph elaborates the data manipulation characteristics of the IoT applications which stand as one of the main reasons for creating bolt.
The observed data manipulation characteristics of the IoT applications are 1) a single writer exit, 2) always generate new data, 3) no random access to it, and 4) retrieve proximate records from the data streams. The traditional databases with support for transactions, concurrency control, and recovery protocols are an overkill for these data and file-based storage offers inadequate query interface as filesystem access happens in sequential order. In addition, data need to be shared between applications and secured while in transit and stored on a storage medium. It should also provide support for policy-based storage that helps minimize cost and efficient utilization of resources. Bolt supports the above data management characteristics, unlike the present storage abstractions. Next, we are going to explain the key techniques used by the bolt to tailor data management for the above applications.
The four main key techniques are chunking, separation of index & data, segmentation, and decentralized access control & signed hash. Chunking is a process of grouping a contiguous sequence of records into chunks. It helps to increase the efficiency of the system by reducing the round trip delay incurred while data access (batching chunks). Data is accessed and stored at the granularity of chunks. Second, separation of index & data help us in two ways 1) index are queried locally 2) trust assumption of the cloud (data stored encrypted in the cloud and decryption happens only at the client side). Third, segmentation is the process of dividing data streams into smaller segments of users defined size. It helps to archive the streams as the amount of data in the stream increases. Finally, bolt use decentralized access control and signed hash to provide confidentiality to data stored at the untrusted cloud storage. It encrypts the data with the owner’s secret key and distribute the keys via a trusted key server. The subsequent paragraph gives an idea about bolt’s implementation.
Bolt API’s allow us to create a data stream which is of two types: ValueStream and FileStream. Former is used for writing small data value such as temperature reading and the latter for larger values like images or videos. The data is added to the stream as a time-tag-value using an append API. A stream consists of two parts – a log of data record (DataLog) and an index that maps a tag to a list of data item identifiers. When a stream is closed, Bolt chunks the segment DataLog, compress and encrypts these chunks and generates a ChunkList. It then uploads the chunks, updated ChunkList and index to the storage server. The chunks are uploaded in parallel and application can configure the maximum number of parallel uploads. Finally, stream’s integrity metadata is uploaded to the metadata server. As mentioned in the previous paragraph, streams are encrypted with a secret key known only to the owner. If the owner wants to give access to other readers, it updates the stream metadata with secret key encrypted with the reader’s public key. In case of reading the data, it first checks the integrity of the metadata with the owner’s public key and the freshness using TTL in-stream metadata before downloading the index and DataLog.
In this paragraph, I am listing few drawbacks of bolt. 1) Fully dependent on control plane, 2) devices unable to subscribe a particular data stream generated from a device 3) each device has its own data stream (missing feature in bolt to merge data stream) 3) prone to pitfalls of the current IoT applications which leverage the cloud for storage (as bolt is using cloud storage), and 4) global scalability will be a challenge as bolt lack location-independent routing of segments. Bolt also uses custom IoT gateways, hence, can lead to interoperability issues.
The performance of bolt was evaluated in two ways: microbenchmark (compared with operating systems read and write: DiskRaw stream operations) and real-world use-cases. In the first approach, they took performance measurements for writes, reads, and scalability. The comparison was done for ValueStream, FileStream, and remote ValueStream. The ValueStream was compared to a single file in DiskRaw; the FileStream with multiple files. The results show ValueStream incurred higher overhead for local writes compared to DiskRaw. For remote streams, 64% of total time was taken to chunk & upload the DataLog; 3% went for index upload. In case of FileStream, its performance is comparable to DiskRaw for local writes. The storage overhead was compared for ValueStream over DiskRaw, it decreases with larger value size. The read performance of the local ValueStream was hindered by the index lookup and a data deserialization. The cost of download dominated for remote reads from ValueStream. The FileStream also have similar performance metrics. The chunking of streams helped to improve the read throughput for temporal range queries. Finally, the time taken to open a stream depends on the time to build the segment index in memory and it grows linearly with the number of segments. The second part of the evaluation is explained in the next paragraph.
They conducted feasibility and performance analysis of bolt with three real world applications such as PreHeat, Digital Neighborhood Watch (DNW), and Energy Data Analytic (EDA). The results were compared with the performance of these applications while using openTSDB. In the first application, the average retrieval time from remote ValueStream decreases with increase in the chunk size. In DNW, chunks improve retrieval time by batching transfers even though it downloads additional data it might not require. With respect to EDA application, a proportional increase in retrieval time for both bolt and openTSDB was observed. Bolt outperform openTSDB by an order of magnitude primarily due to the prefetching of data in chunks. The storage overhead of bolt is 3-5x lesser than openTSDB for all the above applications.
The experiments are excellent and show the benefits of bolt data management system. But, we found the following two drawbacks in bolt: 1) comparison between openTSDB and bolt may be incorrect as openTSDB is a relational database ( even though it supports time-series data ), 2) scalability is weakly tested while doing microbenchmark.
To conclude this summary, bolt is a perfect data management system for emerging class of applications which manage the IoT devices at home. It meets all the requirements of these applications which are unavailable on the existing platforms. The experiments carried out in this paper shows that compared to the openTSDB, bolt performs 40 times faster with 3-5x lesser storage overhead. The drawback highlights the challenges that need to be solved in order to deploy bolt in a highly scalable use case.