Automatically generating narrative charts for comic books, based on xkcd‘s Movie Narrative Charts
Development: Ongoing

  • Description

    The narrative chart is a specialized instance of a Sankey Diagram, famously used to visualize Napoleon’s invasion of Russian [link].

    If you haven’t seen the xkcd charts: “The horizontal axis is time. The vertical grouping of the lines represents which characters are together at a given time”. (Napoleon’s sankey diagram overlays a map, so the horizontal axis doesn’t exactly represent time in that case.) Hover over the grey-colored nodes for the establishing shot of that event.

  • Details

    Input: A comic book (a collection of ordered images, each representing a page).
    Output: Pretty charts like the ones above.

    The parent project –an annotation tool for comic books– gives us the raw data needed to generate the charts.

    1. The Data

      The comic book is annotated with meta-data in the annotation tool that we’re developing. For the purpose of generating narrative charts, the following are the relevant pieces of information:

      • Panel outlines: The frames in each page. For comics that have a simple layout with well-defined gutters (e.g. the comics whose charts are displayed above– at the time of writing: Lucky Luke, Dr. McNinja, Tintin), this step is automated. Otherwise, the outlines are manually drawn over the panels.
      • Reading order: For the moment, manually specified. However, most comics (in fact, almost all of the ones encountered during development) stick to the natural reading order; left-to-right / top-to-bottom, so automating this step is definitely on the to-do list.
      • Characters: The characters present in each “scene” of the comic. We’ve experimented a fair bit with face-detection, and managed to achieve reasonable performance (with hit rates ranging between 0.68 and 0.93, and false alarms between 0.08 and 0.24), but more on that another day. A scene is a range of consecutive panels, and a character need only appear in one of the scene’s panels in order to be included in that scene, hence, we can be forgiving about a suboptimal hit-rate. As for the false-alarm rate: Face-detection is followed by face-recognition, and it should be unlikely for not-a-face to be recognized as the face of a specific character, so my speculation is that the false-alarms will be filed under “unknown” and manually or automatically discarded.
      • Starting panels for scenes: Specified manually with a “new scene” tag. This may be the one thing that would be difficult to automate. The start of a new scene will usually correlate with some other parameters, like the size of the panel relative to the other panels, the use of narration, etc, but a universal correlation (i.e. one that can be found in all comic books) may not exist.
    2. Character Clustering

      As you can probably tell, the clustering of the characters plays the primary role in the generation of the layout.

      The data from the annotation tool is run through a python script that goes through the comic panel by panel and calculates a co-occurrence matrix for the characters. Insignificant characters (by our definition, those appearing in less than two scenes) are discarded.

      We’ve seen much better performance using scene-cooccurrence than panel-cooccurrence, which stands to reason; for instance, a long dialogue between two characters would imply that those characters are probably in the same group, but could give us a panel co-occurrence of zero (in the extreme case) if the panels alternately switch between characters to show the one speaking at that time.

      The co-occurrence matrix is then normalized, converted to a distance matrix, and clustered with DBSCAN.

    3. Generating the layout

      The python script outputs a json file containing descriptions of the scenes (together with the establishing shot of each scene). Sample:

      {
          "scenes": [
              {
                  "duration": 5, 
                  "start": 0, 
                  "chars": [ 0, 2 ], 
                  "id": 0
              }, 
              {
                  "duration": 22, 
                  "start": 5, 
                  "chars": [ 0, 16, 17, 1 ], 
                  "id": 1
              }, 
              {
                  "duration": 11, 
                  "start": 27, 
                  "chars": [ 8, 1 ], 
                  "id": 2
              }
          ]
      }  
      

      The character definitions are read from another file (mainly because we need those for more than one thing).

      The x-coordinates of the scene nodes are determined according to the start and duration of the scene– the real challenge is determining the y-coordinates. One goal is to position the scene nodes such that the number of link crossings is minimized. The seemingly simpler problem of one-sided crossing minimization of bipartite graphs is NP-Hard (for degrees four and higher), so an exact solution is far-fetched. Instead, we rely on heuristics developed through trial and error together with staring at the xkcd charts.

      • Fixed cluster positions: Each cluster of characters gets its own range on the y-axis. The y-coordinate of a scene node then lies within the y-range of the median cluster of the characters appearing in the scene. Ties are broken according to the median clusters of the scenes from which the links are incoming.
      • Cluster sorting: For now, the clusters are sorted such that the densest ones are as far away from each other as possible (in order to avoid cluttering and maximize utilization of the space available). Ideally, we’d cluster the clusters and base the ordering on that (also on the to-do list).
      • Scene positions within the cluster: Within the y-range of a cluster, each character that appears in a scene whose median cluster is that cluster gets a unique y-position. Now, there are several ways to go about determining the exact position of the node within the cluster’s y-range. We could average the cluster-positions of the characters appearing in it (i.e. the positions of the characters within that cluster), or apply the heuristic we initially used to determine the y-range on a smaller scale. Both of these ideas resulted in ugly cluttering, which in retrospect should’ve been expected– what’s common to all the scenes placed in cluster x’s range is that they’re all dominated by characters from cluster x. Therefore, when we take into account the within-cluster positions of every character in the scene, they all end up getting placed at approximately the same y-position, causing ugliness. Currently, we’re averaging the positions of all the non-x-cluster characters in the scene if any exist, and of the x-cluster characters otherwise.

      The above is handled by a javascript file that reads the json from the python script, and draws the chart with d3 (which is just the best!)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>