Data Vis for Image Libraries

Quantitative and Qualitative Explorations for Digital Photo Collections

Data Visualization for Visual Assets

Digital photographs are embedded with qualitative and quantitative attributes. Creative content professionals use methods to filter and group digital images, based on different criteria, to highlight patterns or segments of commercial image libraries. These techniques are valuable internally to organize collections and may also provide innovative ways to share content with clients.

Digital Image data

Case Study of a Digital Asset and Its Data

The image (below, left) has sold many times for different projects, as shown. The line of spreadsheet data is included below the samples.

Evan_resize

image_data

Data for Image Collections

The images for this project are part of a commercial photo library. Data for the images is stored in a dataframe and includes the following variables

filename: asset id

subject: subject category of the image’s content

people_no_people: whether a human is depicted in the image

dominant_color: the dominant color of the image in hex value

transactions: number of times the image has been licensed

revenue: total revenue earned by the image

brightness_median: scale is 0-255

saturation_median: scale is 0-255

hue_median: scale is 0-255

collection: photographer id

rpi: return per image is the average sale price per license

Python and the Altair and Seaborn packages provide methods for analyzing hundreds of thousands of these assets and their CSV data. Below are some of the methods that can be useful for DAM(Digital Asset Management) professionals, content creatives, and their licensing clients.

Plotting Jpeg Images

Jpegs for Points: Brightness and Saturation Scatter Plot

Spreadsheet data and database data without thumbnail reference, can be limiting for creative directors who wish to understand their collections visually by directly seeing the image mapped to its numeric value sets. This Python script substitutes jpegs for points in a scatter graph. Using ImageJ and ImagePlot from Software Studies Initiative, attributes of brightness, saturation, and hue for picture collections were measured and recorded in CSV files. Then Python is used to plot the jpegs according to any measures chosen.


def absoluteFilePaths(directory):
   for dirpath,_,filenames in os.walk(directory):
       for f in filenames:
           yield os.path.abspath(os.path.join(dirpath, f))
images = []
for p in paths:
    foo = Image.open( p )
    foo = foo.resize((int(foo.size[0]/20) ,
    int(foo.size[1]/20)),Image.ANTIALIAS)
    images.append( foo )
fig, ax = plt.subplots()
ax.scatter(x, y)
for x0, y0, image in zip(x, y,images):
    ab = AnnotationBbox( OffsetImage(image)  , (x0, y0), frameon=False)
    ax.add_artist(ab)


bright_sat_DC

Plotting jpegs for an image collection by brightness (x) and saturation (y)


Dominant Color Analysis

color_grid_1_SA

Grids of Dominant Colors for Separate Collections

Identifying dominant colors in images allows the viewer to source assets that fit a desired color scheme. Collections may have a palette that resonates with a specific client, creating an opportunity to contribute to a brand or participate in a campaign. Depending on a photographer’s location, techniques, and personal tastes their image colors vary. Below are six collections’ dominant colors returned in grids.

import seaborn as sns
sns.set()
def hex_to_rgb(hex_value):
  h = hex_value.lstrip('#')
  return tuple(int(h[i:i + 2], 16) / 255.0 for i in (0, 2, 4))
hex_colors = [
  '#f0787e', '#f5a841', '#5ac5bc', '#ee65a3', '#f5e34b', '#640587', '#c2c36d',
  '#2e003a', '#878587', '#d3abea', '#f2a227', '#f0db08', '#148503', '#0a6940',
  '#043834', '#726edb', '#db6e6e', '#db6ecb', '#6edb91'
]
rgb_colors = list(map(hex_to_rgb, hex_colors))
sns.palplot(rgb_colors)
row_size = 15
rows = [rgb_colors[i:i + row_size] for i in range(0, len(rgb_colors), row_size)]
for row in rows:
  sns.palplot(row)


Image Grids for 6 Collections: AU, CT, DC, NC, PX, SA


color_grid-2

Returning Hex Values for Image Points

Replacing the points with the collected dominant color values creates a more meaningful visual color connection for the viewer. The measures of brightness, hue, and saturation are computed from RGB values and scaled to a range between 0 to 255.

dom_color_hex

Filtering for Subject by Hex Values with Interactive Ledgend

Adding filtering capability to the scatter plot will facilitate our grasp of this large set of data points.


subjects = ['Animals', 'Business', 'Kids', 'Lifestyle', 'Other', 'Scenic', 'Sports']
selection =alt.selection_multi(fields=['subject'])
color = alt.condition(selection,
                      alt.Color('dom_c:Q', scale =None, legend=None),
                      alt.value('transparent'))
base = alt.Chart(source, width=300, height = 300).mark_square(filled=True, size=70).encode(

    x=alt.X('hue_median:Q'),
    y='saturation_median:Q',
    color = color, #alt.Color('dom_c', scale =None),
    #tooltip=['filename', 'collection', 'subject','transactions', 'revenue']
    tooltip=[alt.Tooltip('filename:N'),
             alt.Tooltip('collection:N'),
             alt.Tooltip('subject:N'),
             alt.Tooltip('transactions:Q'),
             alt.Tooltip('revenue', format="$"+'.2f')]

).interactive()
legend = alt.Chart(source).mark_point().encode(
    y=alt.Y('subject:N', axis=alt.Axis(orient='right')),
    color=color
).add_selection(
    selection
)

Select Subject in Legend to Reflect Hex Values

The legend below allows the user to find hex value according to the desired image subject.


Slider Filter for RPI Thresholds

Return per Image (RPI) is a metric that reflects the average price per license sale. Dragging the slider shows which images in the graph fall below the given RPI dollar amount (color purple).

rpi_slider = alt.binding_range(min=0, max=5000, step=1, name='rpi_threshold_USD:')
selector = alt.selection_single(name="Revenue_Per_Image", fields= ['rpi_threshold'],
                                bind=rpi_slider, init={'rpi_threshold': 0})

base = alt.Chart(source, height=400, width=400).mark_point().encode(
    x = alt.X('transactions:Q', scale=alt.Scale(zero=False,type='log')),
    y= alt.Y('revenue:Q', scale=alt.Scale(zero=False,type='sqrt')) ,

    color=alt.condition(
        alt.datum.rpi > selector.rpi_threshold,
        alt.value('slategrey'), alt.value('rebeccapurple')
    )
)
slider_filter_rpi = base.add_selection(
    selector
)

Use Slider to See RPIs and Scroll to Zoom

Scroll feature is enabled with .interactive(), giving the user control to drill down into specific areas of the plot.



Collection Subject Analysis

Subject and Collection Table with Bubble Plot

Data provided in a table is concise and understandable. Sized bubbles add another dimension of data to a standard format.

bubble = alt.Chart(source, width=400, height = 400).mark_circle(size=300).encode(
    alt.X('collection:N'),
    alt.Y('subject:N'),
    color=alt.Color('subject',
                     scale=alt.Scale(scheme='tableau10'),
                     legend= None),
    size=alt.Size('sum(revenue)', scale=alt.Scale(range=[0,1500]), title= "revenue USD"),
    tooltip= [alt.Tooltip('sum(revenue)', format="$"+'.2f', title="revenue")]

).interactive()



Identifying Which Subjects Generate Most Revenue per Collection

All of the collections in this study cover multiple subjects. By organizing the collections and subjects in a table with a 3rd dimension of bubble size to represent revenue, the viewer can identify the strengths of each collection. Hovering over each bubble shows earned revenue amounts in a tooltip.





Filtering Sales Data by Collection and Other Attributes

Plotting individual assets by times sold (x-axis) and revenue earned (y-axis) allow the viewer to see selling trends for the images.

scales = alt.selection_interval(empty='all' ,bind='scales',)
selection = alt.selection_multi(fields=['subject', 'collection'])
color = alt.condition(selection,
                      alt.Color('subject:N', legend=None,
                                    scale=alt.Scale(scheme='dark2')),
                      alt.value('transparent'))
scatter = alt.Chart(source).mark_point().encode(
    x = alt.X('transactions:Q', scale=alt.Scale(zero=False,type='log')),
    y= alt.Y('revenue:Q', scale=alt.Scale(zero=False,type='sqrt')) ,
    color= color,
).add_selection(scales).interactive()

legend = alt.Chart(source).mark_rect().encode(
    y=alt.Y('subject:N', axis=alt.Axis(orient='right')),
    x='collection:N',
    color=color
).transform_filter( scales ).add_selection(
    selection
).properties(title="select dimension")



Interactive Legend Gives User Filtering Control

The viewer has flexibility to select subsets of photographs by subject and collection. The tooltip provides detailed information about earnings and other variables for the individual pictures(points) on hover. Multiple blocks may be selcted with Shift key.



Multiple Interactive Charts and Legend

Connecting multiple charts with clickable filtering allows the user to slice data by additional variables within a large image collection.

bars= alt.Chart(source).mark_bar().encode(
    alt.X('count(subject):Q', title = 'image count'),
    y='subject:N',
    color=alt.Color('collection:N',
                     scale=alt.Scale(scheme='tableau10'),
                     legend= None),
    tooltip=[alt.Tooltip('count(filename):Q', title = 'image count')],
).transform_filter(selector).add_selection(
    selector
)
ppl_bars= alt.Chart(source).mark_bar().encode(
    alt.X('count(people_no_people):Q', title = 'image count'),
    y='people_no_people:N',
    color=alt.Color('collection:N',
                     scale=alt.Scale(scheme='tableau10'),
                     legend= None),
    tooltip=[alt.Tooltip('count(filename):Q', title = 'image count')],
    opacity = alt.Opacity('people_no_people:N'),
 ).transform_filter(selector).interactive()



Filtering Across Multiple Variables

Tooltips on hover, scrolling to zoom, and clickable charts allow the user to understand the qualitative and quantitative measures of these collections. In this chart group, the metric of “people/no_people” indicates whether people are depicted in the images. Opacity distinctions and color distinctions bring more visual cues into the data visualization.