02 June 2014
"The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades, … because now we really do have essentially free and ubiquitous data. So the complimentary scarce factor is the ability to understand that data and extract value from it."
Hal Varian, Google’s Chief Economist
“What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.”
Herb Simon
The Pythagoras' theorem is a relation in Euclidean geometry among the three sides of a right triangle. It states that the square of the hypotenuse (the side opposite the right angle) is equal to the sum of the squares of the other two sides.
For all \(\triangle XYZ\), where \(\angle XYZ = 90^\circ\) and length of side \(XY = a\), \(YZ = b\) and \(XZ = c\), there exist a relationship such that:
\(a^2 + b^2 = c^2\)
The First Six Books of The Element of Euclid by Oliver Byrne
Simple
Complex
Derived from the Latin verb videre, "to look, to see"
"The act or instance to form a mental image or picture (without an object) or… the act or instance to make visible or visual (with an object)"
“Transformation of the symbolic into the geometric” - McCormick et al. 1987
“The use of computer-generated, interactive, visual representations of abstract data to amplify cognition.” - Card, Mackinlay, & Shneiderman 1999
Expression | Decorative - Data Art for visual expression, delight (and impact)
Flight Patterns, Internet Census
Exploration | Interactive - Data Tool for engagement, exploration and discovery
Cricket Batting, Working Capital Profiler
Explanation | Narrative - Data Stories for telling a specific and (mostly linear) visual narrative
The Joy of Stats, Wealth Inequality, Out of Sight, Out of Mind
Graph of rate of evaporation of water vs. temperature
Book: Sémiologie graphique / Semiology of Graphics
Visual language is a sign language
“… finding the artificial memory that best supports our natural means of perception.”
∴ Encode quantitative variables
"Resemblance, order and proportion are the three signifieds in graphics.” - Bertin
anscombe
## x1 x2 x3 x4 y1 y2 y3 y4 ## 1 10 10 10 8 8.04 9.14 7.46 6.58 ## 2 8 8 8 8 6.95 8.14 6.77 5.76 ## 3 13 13 13 8 7.58 8.74 12.74 7.71 ## 4 9 9 9 8 8.81 8.77 7.11 8.84 ## 5 11 11 11 8 8.33 9.26 7.81 8.47 ## 6 14 14 14 8 9.96 8.10 8.84 7.04 ## 7 6 6 6 8 7.24 6.13 6.08 5.25 ## 8 4 4 4 19 4.26 3.10 5.39 12.50 ## 9 12 12 12 8 10.84 9.13 8.15 5.56 ## 10 7 7 7 8 4.82 7.26 6.42 7.91 ## 11 5 5 5 8 5.68 4.74 5.73 6.89
Mean
\(\mu_x = 9\); \(\mu_y = 7.5\)
Variance and Correlation
\(\sigma^2_x = 11\); \(\sigma^2_x = 4.1\); \(cor(x,y) = 0.816\)
Linear Regression
\(y = 3.00 + 0.500x\)
\(R^2 = 0.667\)
## John Tukey (1977) Exploratory Data Analysis: An approach to analyze data sets to summarize their main characteristics, often with visual methods
Book: The Visual Display of Quantiative Information
“Above all else, show the data.”
Data-Ink ratio = data-ink / total-ink used in graphics
Don't lie with statistics
Book: Element of Graphing Data
"The important criterion for a graph is not simply how fast we can see a result; rather it is whether through the use of the graph we can see something that would have been harder to see otherwise or that could not have been seen at all."
Aspects | Macintosh | MacBook | Change |
---|---|---|---|
Year | 1984 | 2014 | +30 |
Cost | $2,500 | $999 | 2/5x |
Speed | 8MHz | 1.4GHz | 175x |
Memory | 128KB | 4GB | 30,000x |
Pixels | 512 x 342 | 1440 x 900 | 7.4x |
Screen | 72PPI (9in) | 128PPI (13.3in) | 1.8x |
Book: The Grammar of Graphics
Grammar: “the fundamental principles or rules of an art or science”
"…rules for constructing graphs mathematically and then representing them as graphics aesthetically."
Three metaphors for thinking about visualization
Base Graphics: Written by Ross Ihaka based on experience from S graphics. A pen on paper model and there is no (user accessible) representation of the graphics. Base graphics functions are generally fast, but have limited scope.
grid graphics: Developed by Paul Murrell (2000), Grid grobs (graphical objects) can be represented independently of the plot and modified later. Grid provides drawing primitives, but no tools for producing statistical graphics.
lattice: Developed by Deepayan Sarkar (2008), uses grid graphics to implement the trellis graphics system of Cleveland. You can easily produce conditioned plots but it lacks a formal model
ggplot2: Developed by Hadley Wickam (2007), takes the good things of lattice with the underlying layered grammar of graphics approach. Easy to draw wide range of graphics with compact syntax and independent components
install.packages('ggplot2') library(ggplot2)
Main arguments
General ggplot syntax
ggplot(data, aes(…)) + geom_x() + … + stat_x + …
Layer specifications
Additional components: scales, coordinates, facet
You are working as a team member in a large global project to develop the digital ad strategy for your company. As part of the project, you need to provide an overview of the computing devices the consumers are likely to use to interact with these digital ads.
You have received a spreadsheet from an analyst about these computing devices. These computing devices are been tracked in three main categories - PCs (including desktops and laptops), Tablets and Smartphones. The data sheet includes historical and forecasted data on shipments (devices shipped to the consumer) and installed base (devices being used by the consumers) for these computing devices. In addition, you also have the same data segmented by Operating System (OS) being used on each of these devices.
The data sheet by the analyst is available at http://goo.gl/Zy6lcR
You need to develop a short data visualization for this data set and problem statement (using your preferred visualization and presentation tool). Please do use the data shared by the analyst, though you are free to enrich the same with any additional data or insights from external sources.
You will have 5 minutes to share this overview with the global project team as part of the next project discussion. Please prepare the visualizations accordingly.
Amit Kapoor
Partner, narrativeVIZ Consulting