Analysis and persuasion. A large part of my day-to-day work involves visualising data for either or both of these purposes. I consolidate data into charts and tables, sometimes to help me see patterns and sometimes to help other see what the data has to say. Excel has been an important tool along the way, but tools on their own aren’t enough. Without due care, good tools can pave a path to a bad chart like this one, which fails to inform. Making charts that deliver a clear message about the story in the data, and that do so in the 8-seconds available to get a reader’s attention is not something most of use are naturally good at. And it’s[…]

# Category: Data Science

A useful, but often overlooked Excel feature is the Analysis ToolPak. It’s useful because it packages about 20 commonly used statistical functions in a format that is very easy to use. It gets overlooked because it has to be activated through the Excel Options dialog before it even becomes visible. If the Analysis ToolPak is new to you you will find it in the Data segment of the Excel Tool Ribbon. If you already use it and you are In the process of extending your analysis repertoire to include R, you may just be looking for a guide that shows how to do ToolPak tasks in R. Either way, this post fills a gap with a quick mapping from Excel to[…]

Excel is such a handy tool for data discovery and analysis that it’s fair to ask, “Why bother with anything else, especially an arcane scripting environment like R?” The truth is that the transition from Excel to R is very much a green eggs and ham experience: decidedly unappealing at the outset, even for adventurous folk, but rewarding in the long run. This post takes another look at the Titanic data set, this time using R to do the same analysis done last time in Excel. As before, the starting point is making readable text from the raw, coded data shown above on the left. R provides a simple substitution function that simply specifies the filter conditions for the variable[…]

If data is a window into a problem then one of the main goals of data science is getting people to look through the window: Do you see patterns? Are the apparent patterns real? Do they help you understand the situation? Can you use what you see to predict beyond the horizon? So, how well do everyday desktop tools like Excel fare in helping to provide answers? Teachers of introductory statistics courses (like me) are fond of starting off the discovery process in the form of a game. The class is presented with the data set shown here, giving the basic facts of an unspecified risk event involving a large loss of life. The group is invited to ask questions about[…]

The client wanted 30 minutes. More precisely, they needed to eliminate a 30-minute delay between completing a finished product test in a manufacturing quality control lab and communicating the result back to the operators on the production line. By making data more available to inform decisions about process equipment settings, simply automating that reporting step led to lower scrap rates and higher throughput – a win for everyone. Another client, also a manufacturer, needed to optimize finished goods inventory and production scheduling for a line of consumer electrical products that had 20 or so discrete models. A statistical model used a rolling 3 years of order data in order to predict optimum monthly production volumes for each product. Reviewing the model[…]