Decline of Big Tech Stock

Welcome to my very first #tidytuesday blog post! In this post I will showcase the new per-operation grouping functionality released in dplyr 1.1.0. I also want to demonstrate one of my favorite lesser-known dplyr tricks!

For this analysis, we’ll explore this week’s tidytuesday data set consisting of daily big tech stock prices from 2010-2022.

First, let’s download the data.

library(tidyverse)

big_tech_stock_prices <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-02-07/big_tech_stock_prices.csv')
big_tech_companies <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-02-07/big_tech_companies.csv')

big_tech_stock_prices |> glimpse()

Rows: 45,088
Columns: 8
$ stock_symbol <chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "…
$ date         <date> 2010-01-04, 2010-01-05, 2010-01-06, 2010-01-07, 2010-01-…
$ open         <dbl> 7.622500, 7.664286, 7.656429, 7.562500, 7.510714, 7.60000…
$ high         <dbl> 7.660714, 7.699643, 7.686786, 7.571429, 7.571429, 7.60714…
$ low          <dbl> 7.585000, 7.616071, 7.526786, 7.466071, 7.466429, 7.44464…
$ close        <dbl> 7.643214, 7.656429, 7.534643, 7.520714, 7.570714, 7.50392…
$ adj_close    <dbl> 6.515213, 6.526476, 6.422664, 6.410790, 6.453412, 6.39648…
$ volume       <dbl> 493729600, 601904800, 552160000, 477131200, 447610800, 46…

big_tech_stock_prices |> 
  count(stock_symbol) |> 
  inner_join(big_tech_companies) |> 
  select(company, stock_symbol, n)

# A tibble: 14 × 3
   company                                     stock_symbol     n
   <chr>                                       <chr>        <int>
 1 Apple Inc.                                  AAPL          3271
 2 Adobe Inc.                                  ADBE          3271
 3 Amazon.com, Inc.                            AMZN          3271
 4 Salesforce, Inc.                            CRM           3271
 5 Cisco Systems, Inc.                         CSCO          3271
 6 Alphabet Inc.                               GOOGL         3271
 7 International Business Machines Corporation IBM           3271
 8 Intel Corporation                           INTC          3271
 9 Meta Platforms, Inc.                        META          2688
10 Microsoft Corporation                       MSFT          3271
11 Netflix, Inc.                               NFLX          3271
12 NVIDIA Corporation                          NVDA          3271
13 Oracle Corporation                          ORCL          3271
14 Tesla, Inc.                                 TSLA          3148

Note: Meta and Tesla did not go public until after Jan 2010, so they have slightly less data.

Now let’s visualize the stocks as simple faceted line charts.

big_tech_stock_prices |>
  ggplot(aes(date, adj_close)) +
  geom_line() +
  facet_wrap(~stock_symbol, ncol = 4, scales = "free_y") +
  labs(x = NULL, y = NULL)

With the exception of IBM, each stock peaks around the end of 2021, and then declines thereafter.

Let’s zoom in on the peaks by plotting them on the same axes. Here we will use the new by argument to find the maximum adj_close for each stock_symbol.

big_tech_stock_prices |> 
  filter(stock_symbol != "IBM") |> 
  slice_max(adj_close, by = stock_symbol) |> 
  ggplot(aes(date, adj_close, color = stock_symbol)) +
  geom_point() +
  ggrepel::geom_text_repel(aes(label = stock_symbol), size = 3, vjust = -.75) +
  scale_x_date(labels = scales::label_date_short(), breaks = "month") +
  labs(title = "Peak Stock Prices", x = NULL, y = NULL) +
  theme(legend.position = "none")

How steep of a decline do these stocks see in the weeks and months following their peak? How do they compare to each other?

To answer this question, we will need to filter each stock to include only the data following its peak. This may seem trivial at first, but it’s a bit trickier than you might think. Since each stock reaches its peak at a different point, we can’t simply filter the whole data set by a single value.

One approach would be to make a separate tibble containing just the max price dates, join it back with the original, and filter the dates.

peak_price_dates <- 
  big_tech_stock_prices |> 
  slice_max(adj_close, by = stock_symbol) |> 
  select(stock_symbol, peak_date = date, peak_price = adj_close)

big_tech_stock_prices |> 
  inner_join(peak_price_dates) |> 
  filter(date >= peak_date) |> 
  # for demonstration purposes
  slice_min(date, n = 3, by = stock_symbol) |> 
  select(stock_symbol, date, adj_close)

# A tibble: 42 × 3
   stock_symbol date       adj_close
   <chr>        <date>         <dbl>
 1 AAPL         2022-01-03      181.
 2 AAPL         2022-01-04      179.
 3 AAPL         2022-01-05      174.
 4 ADBE         2021-11-19      688.
 5 ADBE         2021-11-22      674.
 6 ADBE         2021-11-23      665.
 7 AMZN         2021-07-08      187.
 8 AMZN         2021-07-09      186.
 9 AMZN         2021-07-12      186.
10 CRM          2021-11-08      310.
# ℹ 32 more rows

This solution works, but there’s a better (in my opinion) way that doesn’t require a separate tibble. The method is derived from a base-R concept called subsetting.

Subsetting can be used to filter a vector or dataframe by some condition, much like dplyr::filter. Instead of a function call, we use square brackets ([]).

vec <- 0:20
# subset to get even numbers
vec[vec %% 2 == 0]

 [1]  0  2  4  6  8 10 12 14 16 18 20

# subset to get rows with mpg > 21
mtcars[mtcars$mpg > 21,]

                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

In dplyr, we can apply this same principle by combining a filter with a subset. For our case, we combine filter, subset, and which.max. Coupled with per-operation grouping, we can accomplish the entire process in a single step!

stock_downfalls <- 
  big_tech_stock_prices |>
  # filter date by the grouped and subsetted date
  filter(date >= date[which.max(adj_close)], .by = stock_symbol)

stock_downfalls |> 
  ggplot(aes(date, adj_close, color = stock_symbol)) +
  geom_line() +
  ggrepel::geom_label_repel(aes(peak_date, peak_price, label = stock_symbol), data = peak_price_dates, size = 3) +
  scale_x_date(labels = scales::label_date_short(), breaks = "month") +
  labs(x = NULL, y = NULL) +
  theme(legend.position = "none")

Now we can clearly see which stocks endured more dramatic price dips and how they fared over the following 1-2 years.

Thank you so much for reading and I hope this exercise was useful. Please reach out if you have any questions or feedback!