Theater
vs.
Streaming
About Data Visuals Data Critique

Data Critique

The Box Office Mojo dataset includes: total gross revenue (daily and aggregated), number of releases per day, top-performing film per day, and comparative metrics. The data is primarily quantitative and time-based. It focuses on theatrical performances and financial success. The Netflix dataset, comparatively, is metadata heavy. It focuses on the content’s characteristics rather than performances. Specifically, the Netflix dataset includes: title, type (movie or TV show), director, main cast, country of production, release year, (MOST IMPORTANT) date added to Netflix, genre, content rating, and duration. These two datasets represent two different ways of measuring media success.

Together these datasets will allow us to analyze industry-level trends such as the shifts from theatrical releases to streaming, growth in TV vs movie production on streaming, and changes in release strategies. We also want to examine the differences in genres, ratings, and formats between theatrical and streaming. We are also interested in temporal patterns in box office revenue and the changes in Netflix catalog. Overall, our goal is to tell the story of how entertainment distribution and success metrics have evolved over time.

Despite the two datasets’ strengths and how they complement each other, there are still some blind spots. Firstly, you can not directly compare financial success versus streaming success. We also cannot measure culture impact or audience reception because we don’t have the popularity metrics for the box office or Netflix data. The Box Office Mojo data set also does not include production budgets. This could have given us more insight about how movies have become more or less profitable. It seems likely that omissions are not random and reflect what companies choose to disclose or conceal. Furthermore, both the datasets have a bias towards visibility. The Netflix dataset does not include information about all streaming services, and the Box Office Mojo dataset gives the most attention to films that have tracked theatrical releases. This takes away from Indie films as well as direct to streaming releases. The biggest issue is the lack of a measure of “success” across platforms. However people define “success” in different ways so it is incredibly difficult to measure it.

The Box Office Mojo Data was aggregated box office revenue reported by film studios and distributors. The data updates daily based on industry reporting. It relies on industry self-reporting which means that studios control what is released. Box Office Mojo is owned by IMDb which is an Amazon subsidiary. It is backed by corporate infrastructure and advertising. This dataset’s ontology frames success as revenue and rankings. It treats theatrical releases as measurable and important. It reinforces a capitalist model of success.

The Netflix dataset on the other hand was compiled from Flixable, a third-party catalog tracker, scraped from Netflix’s public catalog. It was curated and uploaded to Kaggle by Shivam Bansal, an independent user. From what we can find, the dataset was made without any institutional funding. This dataset’s ontology frames content as catalog entries (focusing on metadata). It emphasizes quantity and diversity over success. It erases performance metrics and makes all content appear equal. Unfortunately the kaggle dataset stopped updating in 2021, but Flixable is still running if we wanted to potentially expand our dataset.

If these datasets were our only source, we could conclude how the film industry is primarily driven by profit and platform availability. We would also attempt to compare streaming and theaters by using categories and revenue. However, there is a difference between access and engagement. Furthermore, we would miss the human element. The human experience is an essential part of consuming media. It also would not give much light to the structural inequalities of the entertainment industry.