Category Archives: IoT

Bolt: Data Management of Connected Homes

Bolt is a data management system for emerging class of applications that helps IoT devices to interact and store data. The unique requirements of these applications such as support for time-series and tagged data, ability to share data between devices and assurance on data confidentiality & integrity have made the older platforms unsuitable. These platforms such as HomeOS, MiCasa Verde and so on provide high-level abstraction mainly for devices to interact and not for storage. The following paragraph elaborates the data manipulation characteristics of the IoT applications which stand as one of the main reasons for creating bolt.    

The observed data manipulation characteristics of the IoT applications are 1) a single writer exit, 2) always generate new data, 3) no random access to it, and 4) retrieve proximate records from the data streams. The traditional databases with support for transactions, concurrency control, and recovery protocols are an overkill for these data and file-based storage offers inadequate query interface as filesystem access happens in sequential order. In addition, data need to be shared between applications and secured while in transit and stored on a storage medium. It should also provide support for policy-based storage that helps minimize cost and efficient utilization of resources. Bolt supports the above data management characteristics, unlike the present storage abstractions. Next, we are going to explain the key techniques used by the bolt to tailor data management for the above applications.    

The four main key techniques are chunking, separation of index & data, segmentation, and decentralized access control & signed hash. Chunking is a process of grouping a contiguous sequence of records into chunks. It helps to increase the efficiency of the system by reducing the round trip delay incurred while data access (batching chunks). Data is accessed and stored at the granularity of chunks. Second, separation of index & data help us in two ways 1) index are queried locally 2) trust assumption of the cloud (data stored encrypted in the cloud and decryption happens only at the client side). Third, segmentation is the process of dividing data streams into smaller segments of users defined size. It helps to archive the streams as the amount of data in the stream increases.  Finally, bolt use decentralized access control and signed hash to provide confidentiality to data stored at the untrusted cloud storage. It encrypts the data with the owner’s secret key and distribute the keys via a trusted key server.  The subsequent paragraph gives an idea about bolt’s implementation.

Bolt API’s allow us to create a data stream which is of two types: ValueStream and FileStream. Former is used for writing small data value such as temperature reading and the latter for larger values like images or videos. The data is added to the stream as a time-tag-value using an append API. A stream consists of two parts – a log of data record   (DataLog) and an index that maps a tag to a list of data item identifiers. When a stream is closed, Bolt chunks the segment DataLog, compress and encrypts these chunks and generates a ChunkList. It then uploads the chunks, updated ChunkList and index to the storage server. The chunks are uploaded in parallel and application can configure the maximum number of parallel uploads. Finally, stream’s integrity metadata is uploaded to the metadata server. As mentioned in the previous paragraph, streams are encrypted with a secret key known only to the owner. If the owner wants to give access to other readers, it updates the stream metadata with secret key encrypted with the reader’s public key. In case of reading the data, it first checks the integrity of the metadata with the owner’s public key and the freshness using TTL in-stream metadata before downloading the index and DataLog.

In this paragraph, I am listing few drawbacks of bolt.  1) Fully dependent on control plane, 2) devices unable to subscribe a particular data stream generated from a device 3) each device has its own data stream (missing feature in bolt to merge data stream) 3) prone to pitfalls of the current IoT applications which leverage the cloud for storage (as bolt is using cloud storage), and 4) global scalability will be a challenge as bolt lack location-independent routing of segments. Bolt also uses custom IoT gateways, hence, can lead to interoperability issues.

The performance of bolt was evaluated in two ways: microbenchmark (compared with operating systems read and write: DiskRaw stream operations) and real-world use-cases. In the first approach, they took performance measurements for writes, reads, and scalability. The comparison was done for ValueStream, FileStream, and remote ValueStream. The ValueStream was compared to a single file in DiskRaw; the FileStream with multiple files. The results show ValueStream incurred higher overhead for local writes compared to DiskRaw. For remote streams, 64% of total time was taken to chunk & upload the DataLog; 3% went for index upload. In case of FileStream, its performance is comparable to DiskRaw for local writes. The storage overhead was compared for ValueStream over DiskRaw, it decreases with larger value size. The read performance of the local ValueStream was hindered by the index lookup and a data deserialization. The cost of download dominated for remote reads from ValueStream. The FileStream also have similar performance metrics.  The chunking of streams helped to improve the read throughput for temporal range queries. Finally, the time taken to open a stream depends on the time to build the segment index in memory and it grows linearly with the number of segments. The second part of the evaluation is explained in the next paragraph.

They conducted feasibility and performance analysis of bolt with three real world applications such as PreHeat, Digital Neighborhood Watch (DNW), and Energy Data Analytic (EDA). The results were compared with the performance of these applications while using openTSDB. In the first application, the average retrieval time from remote ValueStream decreases with increase in the chunk size. In DNW, chunks improve retrieval time by batching transfers even though it downloads additional data it might not require.  With respect to EDA application, a proportional increase in retrieval time for both bolt and openTSDB was observed. Bolt outperform openTSDB by an order of magnitude primarily due to the prefetching of data in chunks. The storage overhead of bolt is 3-5x lesser than openTSDB for all the above applications.

The experiments are excellent and show the benefits of bolt data management system. But, we found the following two drawbacks in bolt: 1) comparison between openTSDB and bolt may be incorrect as openTSDB is a relational database ( even though it supports time-series data ), 2) scalability is weakly tested while doing microbenchmark.

To conclude this summary, bolt is a perfect data management system for emerging class of applications which manage the IoT devices at home. It meets all the requirements of these applications which are unavailable on the existing platforms. The experiments carried out in this paper shows that compared to the openTSDB, bolt performs 40 times faster with 3-5x lesser storage overhead. The drawback highlights the challenges that need to be solved in order to deploy bolt in a highly scalable use case.

Leave a comment

Filed under bolt, Distributed Systems, IoT, Operating Systems, Storage

The Cloud is Not Enough: Saving IoT from the Cloud

The Internet of Things(IoT) represents a new class of applications which leverages the advantage of the cloud. This has allowed us to collect data from sensors and stream it to the cloud without worrying about the economic viability of storing and processing this data. But the current approach which is used to connect the IoT applications directly to the cloud has many drawbacks. The concerns regarding privacy, security, scalability, latency, bandwidth, availability and durability of data generated by these IoT applications has not been addressed. In order to overcome these drawbacks,  a data-centric approach has been adopted to create an abstraction between the IoT applications and the cloud.

The data-centric abstraction is called a Global Data Plane(GDP) which focuses on distribution, preservation, and protection of data. It supports the same application model as cloud while better matching the needs and characteristics of the IoT by utilizing the heterogeneous computing platforms, such as small gateways devices, moderately powerful nodes in the environment and the cloud in a distributed manner. The basic foundation of GDP is the secure single-write log and applications that are build on top of it are interconnected  through log streams rather than by addressing devices or services via IP.

The data generated by the IoT devices are represented as logs, also called as single-writer time series logs. This log is append only; mostly read-only and can be securely replicated and validated through cryptographic hashes.  The log-based approach deals with the issues of flexibility, access control, authenticity, integrity, encryption, durability and replication of data. These logs also need to be stored onto the infrastructure, current storage approach on the cloud doesn’t offer flexible placement, low latency or durability of information. To enable these this paper introduce Location-independent Routing in which packets are routed through an overlay network that uses Distributed Hash Table(DHT) technology. Dynamic topology change, pub/sub, and the multicast tree can be built over these overlay network in order to optimize latency and network bandwidth.  Although GDP can provide most of the functionality that are needed for applications, some applications may need additional support which can be provided by Common Access API(CAAPI). CAAPI is a layer above the GDP layer and plays a major role in replaying logs when a service fail. Checkpointing techniques can be used to avoid the overhead incurred due to log reply.

The data-centric approach used in this paper has help to overcome the pitfalls of today’s IoT applications. Though these problems are prevalent in web applications; when it comes to IoT space, it becomes more complex. I have written this writeup based on the paper “The Cloud is Not Enough: Saving IoT from the cloud”[1].

[1]:https://www.usenix.org/conference/hotcloud15/workshop-program/presentation/zhang

Leave a comment

Filed under Distributed Systems, Global Data Plane, IoT, Storage