Developing Open Data solutions is often challenging for different reasons. The first challenge, once the application is defined, is to get the correct data source and to process it in some useful way. There are several Open Data governmental initiatives worldwide, such as the ones from Brazil, USA or UK (to name only 3). Two common ways of processing them are by downloading the complete data or by accessing through an Open API. This latter would prevent from creating any data processing infrastructure, which many developers could not (or do not want) to afford.
The following is a common assumption when developers want to create some application: give me tons of data and I will process in a useful way. However, there is an important cost for producing this data. A usual way is to produce large CSVs on JSON files extracted from OLTP or OLAP systems. Another one is by extracting data from different sensors.
Our group (C3SL: Centro de Computação Científica para Software Livre) developed a monitoring and evaluation (M&E) system for the Ministry of Communications from Brazil where we had to deal with 1) the production and collection of data, in addition to the 2) transformation of huge data streams into useful information and 3) providing intuitive graphical analytic interfaces. The system monitors digital inclusion initiatives, which in short, are devices (computers) put available for the people in different points of the country. It receives information on availability, hardware and software inventory, and network bandwidth usage, with more than billions of records. This enables to check if the devices are being used and correctly used.