Our group has been working on different initiatives to produce useful open data to end users. One of them is the collection and monitoring for assessment and evaluation, Another one focuses on the integration and search of OER (Open Educational Data). In this post, I will focus on a third initiative, where we extract existing Educational Open data and we provide an end-to-end framework, called LDE (Laboratório de Dados Educacionais). As we can see, there are a lot of research and development opportunities in this field, and, as a huge plus, we can provide useful services with the outcome.
Still, the opportunities are always followed by several challenges. Providing useful Open Data would need a complete team, from the domain expert up to the web developer, so often it cannot be handled only by a data scientist (more work for us!). In addition, processing large amounts of data is not always possible in a desktop computer and/or using a spreadsheet-based application. There are many trade-offs to be made, meaning that it is not always possible to choose the best state-of-the-art approach for different technical reasons. We have faced 5 main challenges
  • Challenge 1: how to find the most effective specification format to be exchanged by the domain experts with the database/application developers.
  • Challenge 2: the open data is made available on a yearly basis by the Brazilian government and we have no control over the provided data sources. This means that we had to handle schema and data evolution
  • Challenge 3: having time constraints in mind, how to choose the most appropriate data model, with less impact on query development and still obtaining performance.
  • Challenge 4: how to couple the query development with a REST API, since the main goal was to make the data easily available.
  • Challenge 5: how to develop an attractive front-end on a fast way.

We have learned and we are still learning several lessons on developing this project, going from human to technological ones. We can assure that developing a complete Open Government Data initiative is a very complex task, which would be hard to be provided by isolated experts in the field or data scientists. The main computational difficulty is not to provide a single metric or indicator, which is doable with lesser development efforts (of course there is a huge effort from the domain experts to choose and organize the best indicators), but to keep a continuous pace on delivering new indicators with very large amounts of data, and updating them every time a new data release is available. The work, as seen in the screenshot below, is also available at https://dadoseducacionais.c3sl.ufpr.br/#/indicadores/

We have an article with more details about the overall framework at ICWE 2018, entitled: Educational Open Government Data: from requirements to end users. I intend to detail the faced challenges in subsequent posts.