Home » Categories » CHARTrunner

CR: A simple query pattern for summarizing raw data

CHARTrunner customers like to visualize piles of data in chart form. Something about a neat line going across the page is easier to consume than rows and columns of numbers. Recently, we’ve had several calls that follow a similar pattern. The customer has a data set containing what I’ll call rawdata and they want to make a chart. They are quite good at using CHARTrunner but their data set is not arranged in a form that is handy for the chart they want.

Consider data from a table named Orders:

Date

OrderType

ProductCategory

OrderTotal

4/1/2009

Web

Book

12.76

4/1/2009

Phone

Music

14.34

4/1/2009

Walk-in

Art

23.89

4/1/2009

Web

Coffee

3.57

4/1/2009

Web

Book

11.29

4/2/2009

Web

Music

22.15

4/2/2009

Phone

Music

10.48

Etc.

     

Imagine you want to make a chart of Sales. To do this, you tell CHARTrunner to use the OrderTotal as a measurement. This will work – but the chart may not be that interesting and it may not be what you want. It will have one data point for each row in the table. It is a chart of the OrderTotal for each order.

A more interesting chart would show the sum of OrderTotal for each unique Date; a Daily Sales chart. As you can see, this daily total is not represented in the table. One way to solve this problem might be to bring the data into Excel and use Excel features such as formulas and/or pivot tables to create the chartable data you really want. The problem is that the data massaging work must be repeated each time you want see an updated chart of total daily sales.

A more effective approach is to summarize the data as part of a query. For example, instead of getting the result data above, we want to get a data set that looks like this:

Date

TotalSales

4/1/2009

65.85

4/2/2009

83.11

4/3/2009

109.32

4/4/2009

22.80

Etc.

 

In the table above, we have one row of data for each unique Date. The row contains a sum of the OrderTotal rows for a given date. In this table, we have aggregated rows from the raw data table into single rows for each date. How do we do this in CHARTrunner and avoid massaging our raw data each time we want a new chart?

In CHARTrunner, on the chart definition form, under the data definition tab, you will find these choices for where to get data for a chart:

Table
Query or view
Stored procedure
Custom query

When you select Custom query you need to write an SQL query to specify how to select data. Writing an SQL query is intimidating for some users but I would suggest that you give it a try. It provides a lot of power and is not as difficult as you might think. To solve the problem described above, we will write a custom query.

The simplest query we could write from the Orders table might look like this:

SELECT * From Orders

This would give us the raw data just as we see in the first table above. The asterisk means that we want data from ALL the columns in the table. We could also write the query like this instead:

SELECT Date, OrderTotal FROM Orders

This would give us all the rows, but only include the two columns we specified. Enough of the basics, here is the query we need to solve our problem:

SELECT SUM(OrderTotal) as DailyTotal, MIN(Date) as ActualDate FROM OrderTotal GROUP BY Date ORDER BY Date

This query contains the aggregate functions SUM and MIN. An aggregate function summarizes multiple rows from the data table. When you use an aggregate function, you must explain how the groups of rows should be formed. For example, which rows should be added up to provide the SUM we asked for? We use the GROUP BY clause for this.

In the example above we are saying: form groups of rows using the Date column. We will get one row in the results table for each unique Date that we find. In each row, there will be a DailyTotal column that contains the SUM of the OrderTotals for the date.

The query also provides an ActualDate column in the results table. Since we are using GROUP BY, we only get ONE row for each date. Since the row summarizes many rows, we have to use an aggregate function on the Date column also. This is just a trick to ensure the results table will have a Date column that we may want to use for X-axis labels on our chart.

Don’t be intimidated by this query. Copy it and paste it into a new chart definition in CHARTrunner. Adjust it so it works with your table name and your column names. Once you know how to write aggregate queries like this, a whole new range of charts is possible based on the raw data that you already have flowing into many data sources. The data does not need to massaged each time you display the chart. Define the query once then reuse it often.

The .zip file which accompanies this article contains an Excel sheet with the example data above. It also contains several CHARTrunner chart definitions that look at this data in various ways using example custom queries. One of these may provide a good query pattern for you to start with analyzing your own data. Unzip this into some temporary folder. Run CHARTrunner and make this the working folder. Display each chart then study the custom query behind it. Happy charting!

Attachments (1) Attachments
Related Articles RSS Feed
CR: Some Excel Spreadsheet Values Seem To Be Missing
Viewed 2212 times since Fri, Nov 9, 2007
Cr: Charting elapsed time
Viewed 2450 times since Fri, Nov 9, 2007
CR: Collating sequence not supported
Viewed 2626 times since Tue, Dec 11, 2007
CR: Histogram bars beyond specification lines
Viewed 2314 times since Fri, Nov 9, 2007
CR: Grouping counts by date in Excel
Viewed 2170 times since Fri, Nov 9, 2007
CR: CR1.5 - My control chart data lines are not printing
Viewed 2183 times since Fri, Nov 9, 2007
CR: Charting pass-fail data as a p-chart
Viewed 4450 times since Fri, Feb 29, 2008
CR: CR1.1 - Charts in the Chart Workspace do not print all of the charts
Viewed 2044 times since Fri, Nov 9, 2007
CR: Data grouping tardiness
Viewed 3161 times since Tue, Oct 6, 2009
CR: CR1.1 - Notes and Causes Problem
Viewed 2088 times since Fri, Nov 9, 2007