Defining Application Requirements

Building a Data Visualization App with Flask and React (Part 1)

Introducing this Article Series

In this series of blog posts I am going to demonstrate how to create a web application from the design phase all the way through deployment. I have chosen to make this application using Flask, a backend Python web framework, and React, a frontend Javascript web framework. While much of the information in this series is going to be useful for anyone interested in the process of building a web application, there are going to be many parts that dig into the particularities of the technologies I'm using. If any post is going to just deep dive into the code for a particular technology I'll give you a heads up beforehand.

You can see the code for this application at: [https://github.com/Nunie123/narratus]

You can also check out:

Backend: 1. Part 1: Gathering Requirements 2. Part 2: Picking a Tech Stack 3. Part 3: Building a Data Model - (coming soon) 4. Part 4: Authentication with JavaScript Web Tokens (JWT) - (coming soon) 5. Part 5: Building the API Endpoints - (coming soon) 6. Part 6: Using Automated Testing - (coming soon) 7. Part 7: Saving Passwords with AWS's Key Management System - (coming soon) 8. Part 8: Executing User-Provided SQL Against a 3rd Party Database - (coming soon) 9. Part 9: Deploying Backend to Ubuntu server - (coming soon)

Frontend: 1. Part 1: Building the Views - (coming soon) 2. Part 2: Incorporating Redux - (coming soon) 3. Part 3: Pulling Data from API - (coming soon) 4. Part 4: Testing - (coming soon)

The First Step: Gathering Requirements

When building any application the first step is gathering and recording requirements. There's no way to build an application unless you know what the application is going to do. And having a general sense of what you want to build is not sufficient ("It'll be like Facebook, except I'll be the billionaire instead of Zuckerberg").

Coding is very much a detail-oriented practice. While you don't need to plan out every function and every variable, you do need a list of features describing what the application should be able to do. This list is called the requirements because all of the included features are required before the application can be considered complete. It can sometimes be helpful to frame them as User Stories("As a User, I can view all of the other users I have identified as my friend in a single view").

Frequently you will need to go to other people to find out the requirements (thus the gathering of requirements). If you work on a professional software development team you will likely have a Product Owner that will be in charge of defining the requirements. On smaller teams or working by yourself you'll have to gather the requirements on your own. In gathering requirements it's important not to just think about what the user wants, but to go out and actually talk to prospective users to find what is valuable. There have been many well built applications that were complete failures because they were based on features users didn't actually want.

There is a good chance that you will be unable to build an effective application without soliciting user feedback. If you are planning on making a commercial product, this is critical. If you're doing a personal project to explore a new technology or practice your skills, build an application that you will use. Now you have easy access to a prospective user: yourself.

When I compiled the requirements for the application I'm building in coordination with these blog posts, called Narratus, I knew the intended users are the members of the data engineering team I work on. Our team has had many group discussions about improved tooling to increase our productivity. When preparing for this project I had a long brain-storming session with our Team Lead about the tooling we wish we had to do our jobs. I took notes and distilled that conversation into the requirements document shown below.

Level of Detail in Requirements Documents

Getting the right level of detail in the requirements is a balancing act. If you provide too little detail then you leave too much decision making about requisite functionality at the discretion of the person writing the code. Even if it's the same person writing the code that defined the requirements, in the middle of a coding session is the wrong time to be thinking about what features should be implemented. Even if you do an excellent job defining your requirements on the fly, you'll have to context switch to put your requirements gathering hat on, which is a recipe for wasted time and worse code.

Too much detail has its own problems. First, it means you'll be spending an awful lot of time writing requirements before you can get down to writing code. In (Waterfall application development)[https://en.wikipedia.org/wiki/Waterfall_model] it is typical to go into a great amount of detail about everything that will be built and how it will work. In theory this doesn't seem so bad, as it gives management an opportunity to approve the work to be done and the developers don't end up making independent design decisions that conflict with each other. In practice, however, too much detail can lead to problems (which is why the Waterfall Model has given way in most companies to Agile development). These problems usually occur because the requirements will often need to be changed over the course of development, and the Waterfall model does not handle these changes well.

Requirements Change

While gathering the requirements is a critical initial step in building an application, do not assume that the process of gathering your requirements has ended. As mentioned briefly above, expect your requirements to change. If you're working on a professional development team you should be seeking feedback from your users throughout the project: showing them prototypes and beta versions or asking clarifying questions. This will frequently result in requirements being added, changed, or dropped. This iterative process of refining an application based on user feedback is a central tenet of Agile software development.

In addition to requirements changes from users, there may be changes that result from the development process itself. It could be that in the process of writing the actual code you realize features that probably should have been included ("Is there a reason we are only putting a logout button on half of the views in our application?"). Or you may find that a requirement is too vague to be implemented ("What do you mean by 'The user should be able to talk to the application'?"). It's also common to find that what appeared to be a small feature will end up more than doubling the complexity of the application ("Adding an AI Chatbot to our flashlight application seems like a cool feature, but maybe we'd prefer to ship within the decade, instead.").

Well if we're just going to change the requirements then what's the point? The point is that you need some way to define what you will be coding and when you will be done. While you can and should make changes, those changes should be done with careful deliberation and, ideally, lots of research. Remember that software is for the users, so their needs should always be foremost when thinking about requirements.

Gathering Requirements for Narratus

Like all good applications, I came up with the idea for Narratus when I recognized a problem and wanted to explore how an application might address that problem. One of the core responsibilities of my team at work is to perform analyses on the data generated by another team's application and then turn that analyses into automated reports. I noticed that our team is pretty good at transforming the data and performing analyses, but was not as productive as I would hope when translating those analyses into informative and visually appealing reports.

Right now we have a custom python application (a version of which has been published on GitHub) that executes predefined SQL queries against our Data Mart, pipes that data into custom HTML/CSS/JavaScript files which are printed to PDF and emailed. While this application has served us well, and was quite an achievement for us given the constraints we were working under when it was developed, it has some fundamental shortcomings that suggest a complete rewrite is warranted rather than incremental improvements.

First, the methods of report publication are inherently limited. The application can save reports to the server or email them out. Because there is no GUI, there is no way to display reports on the internet, our company intranet, or locally. Despite using HTML to generate the reports, the final reports are only available in PDF (or as Excel files if there are no visualizations).

Second, our process for generating visualizations through custom JavaScript is slower than we would like. We are known as the Data Team, so perhaps its not surprising that we're not rock star JavaScript developers. However, data visualization is a core function of our team. The faster we are able to generate and iterate on visualizations and reports, the more productive we will be. While we have looked at various BI tools to help with visualization, including a long look at Tableau, we decided that the tools available didn't quite meet our needs. Given the expense (we were quoted around $20k/year for Tableau, which has similar pricing to the rest of the BI industry), we were reluctant to get Locked In to a product that only did 90% of what we needed.

Finally, our queries, visualizations, and reports could use some better organization. We produce a lot of SQL files, but we don't have a convenient way to browse and tag our files. We have a lot of reports that are often rewritten, abandoned, and reinstated. It sometimes be a bit of a process to figure out which code goes with which report, and which reports are currently being published. Right now we mostly convey this through a combination of file structures, naming conventions, SQL tracking tables, and memory.

Our team has had a series of conversations about what sort of tooling improvements would allow us to be more productive. Those conversations as well as a detailed conversation I had with our Team Lead constituted the research I did to build the below requirements document.

Narratus Requirements

The requirements document resides in the Narratus repo here.

  1. Report Generation
    1. Query databases to retrieve data.
    2. Create Excel file from tabular data.
    3. Create charts from data.
    4. Compose charts and tables into pdf document.
    5. Compose charts and tables into html document.
    6. Automated report generation.
    7. Logging.
  2. Report Publication
    1. Publication types:
      1. Email with attached report.
      2. Email with embedded report (e.g. embedded html email).
      3. Download to user's computer.
      4. Update website.
    2. Recipient types:
      1. Passive - Recipient receives email reports controlled by administrator.
      2. Active - Recipient can access application to download or subscribe to reports.
    3. Automated report publication.
    4. Logging.
  3. Data Exploration
    1. UI interface to create charts from query results.
    2. Charts design updated with existing dataset.
    3. Chart design persists for new query results.
    4. Queries and charts easily saved.
  4. Code Organization
    1. Reports saved in one location and categorized.
    2. Display report attributes:
      1. Recipients.
      2. Visualizations.
      3. Queries.
      4. Update/publication schedule.
      5. Publication method(s).
  5. Copy Data Between Databases
    1. Select source and destination databases, schemas, and objects.
    2. Schedule regular updates.
    3. Specify updating all data or only new records.
    4. Logging.
  6. Update Database Objects
    1. Update derived tables from copied data.
    2. Add indexes to copied and derived tables.
    3. Logging.