Using R: 我到底在 B 站花了多长时间?

前段段时间受伤卧病在床,难得的闲暇时间,又以躺着不便于学习为由,疯狂娱乐。几乎沉迷 B 站无法自拔,蓦然回首发现好像在小破站花费了不少时间,遂试图总结一番。

既然想要总结分析在 B 站的动态,数据获取必然是最重要的,然而 B 站似乎并未提供公开的 API 供查询,幸而已有热心网友分享:

SocialSisterYi/bilibili-API-collect

SocialSisterYi/bilibili-API-collect(下文简称项目),通过对 B 站 Web 端、移动端以及 TV 端等诸多来源的 B 站 API 进行收集整理,汇总了一份较为全面的非官方 API 文档。

本文基于项目,利用 R 语言对笔者在 B 站的历史记录进行分析总结。

1 设置登陆信息

既然要访问历史记录,毫无疑问需要设置登陆信息。根据项目中的API 认证与鉴权以及登录基本信息的说明,首先设置 Cookies 信息,然而本以为只要简单的 httr::GET + httr::set_cookies 就能轻松秒杀,然而未曾想过的是,设置 cookies 就耗时良久。

根据 API 认证与鉴权中的说明,访问 B 站的 cookies 需要 DedeUserID、DedeUserID__ckMd5、SESSDATA 以及 bili_jct。

这不难,直接 Chrome + F12 调试模式,Application 选项卡直接查看即可。

然而,这里获取的 SESSDATA 和 bili_jct 是经过转义了的,因此在使用 httr::set_cookies 生成 cookies 时程序默认会再次转译,然后就报错了……就这个问题,我已经在 httr 提交了新的 PR 试图解决,至于能不能合并以及什么时候会合并,就不得而知了。

不过既然是要强制转译,那我们就给 httr::set_cookies 提供已经反转译的 cookies 即可。这里要用到 curl::curl_unescape,实际上 httr::set_cookies 就是通过向量化调用 curl::curl_escape 来完成的转换。具体而言,代码如下:

library("httr")
cookies <-
  httr::set_cookies(
    DedeUserID = rstudioapi::askForPassword("DedeUserID"),
    DedeUserID__ckMd5 = rstudioapi::askForPassword("DedeUserID__ckMd5"),
    SESSDATA = curl::curl_unescape(rstudioapi::askForPassword("SESSDATA")),
    bili_jct = curl::curl_unescape(rstudioapi::askForPassword("bili_jct"))
  )

在后续的操作中,只要在请求中附上 cookies 即可。

2 获取历史记录

首先是查询历史记录,在历史记录章节中提供了新/旧两个 API.

http://api.bilibili.com/x/web-interface/history/cursor

http://api.bilibili.com/x/v2/history

虽然新的 API 可以请求到包括视频、直播和专栏在内的多种观看记录,然而笔者仅从 B 站观看视频,因此旧 API 就足够,其次旧版 API 可以返回更多的历史记录,也特别适合本次案例。

此外,为了获取尽可能多的观看记录,这里还使用 pn 控制历史记录偏移量,pn 每增大一,请求记录就往更久方向移动 300 条。笔者经过实验,发现该案例中最多请求到 pn=4。那么我们就分别执行 4 次请求并合并。其中请求在返回对象的 $data 中。

library("jsonlite")
library("pillar")
library("purrr")
library("dplyr")
library("tibble")

pn_ls <-
  c(1:4)

history_resp_ls <-
  map(pn_ls,
      function(pn){
        history_resp <-
          httr::GET(url = 
                      "http://api.bilibili.com/x/v2/history",
                    config = cookies,
                    query = list(pn = pn))
        
        history_content <-
          httr::content(history_resp, type = "text")
        
        # The response of GET is a json
        history_from_json <-
          jsonlite::fromJSON(history_content)
        
        # The history records are in `data`
        history_from_json$data
        }
      )

history_tb <-
  reduce(history_resp_ls, bind_rows) %>% 
  as_tibble()

# glimpse(history_tb)
head(history_tb)
## # A tibble: 6 × 37
##         aid videos   tid tname   copyright pic   title pubdate  ctime desc  state
##       <int>  <int> <int> <chr>       <int> <chr> <chr>   <int>  <int> <chr> <int>
## 1 632936267      1   212 美食侦…         1 http… 还是…  1.63e9 1.63e9 "-"       0
## 2 721202142      1   228 人文历…         1 http… 中国…  1.63e9 1.63e9 "移…      0
## 3 378655043      1   176 汽车生…         1 http… 什么…  1.63e9 1.63e9 "诈…      0
## 4 676243719      1    76 美食制…         1 http… 这是…  1.63e9 1.63e9 "日…      0
## 5 758813440      1   138 搞笑            1 http… 太顶…  1.62e9 1.62e9 "吴…      0
## 6 847480646      1    28 原创音…         1 http… “梗…   1.63e9 1.63e9 "引…      0
## # … with 26 more variables: duration <int>, rights <df[,12]>, owner <df[,3]>,
## #   stat <df[,11]>, dynamic <chr>, cid <int>, dimension <df[,3]>,
## #   short_link_v2 <chr>, up_from_v2 <int>, favorite <lgl>, type <int>,
## #   sub_type <int>, device <int>, page <df[,8]>, count <int>, progress <int>,
## #   view_at <int>, kid <int>, business <chr>, redirect_link <chr>, bvid <chr>,
## #   mission_id <int>, season_id <int>, redirect_url <chr>, bangumi <df[,7]>,
## #   cheese <df[,5]>
summary(history_tb)
##       aid                videos          tid           tname          
##  Min.   :  2599625   Min.   : 1.0   Min.   : 17.0   Length:1200       
##  1st Qu.:336101652   1st Qu.: 1.0   1st Qu.: 31.0   Class :character  
##  Median :587790294   Median : 1.0   Median :138.0   Mode  :character  
##  Mean   :560113126   Mean   : 1.2   Mean   :124.2                     
##  3rd Qu.:763318868   3rd Qu.: 1.0   3rd Qu.:212.0                     
##  Max.   :976237705   Max.   :49.0   Max.   :239.0                     
##                                                                       
##    copyright         pic               title              pubdate         
##  Min.   :1.000   Length:1200        Length:1200        Min.   :1.437e+09  
##  1st Qu.:1.000   Class :character   Class :character   1st Qu.:1.614e+09  
##  Median :1.000   Mode  :character   Mode  :character   Median :1.631e+09  
##  Mean   :1.142                                         Mean   :1.619e+09  
##  3rd Qu.:1.000                                         3rd Qu.:1.632e+09  
##  Max.   :2.000                                         Max.   :1.635e+09  
##                                                                           
##      ctime               desc               state             duration       
##  Min.   :1.497e+09   Length:1200        Min.   :-100.000   Min.   :     9.0  
##  1st Qu.:1.614e+09   Class :character   1st Qu.:   0.000   1st Qu.:   120.0  
##  Median :1.631e+09   Mode  :character   Median :   0.000   Median :   364.0  
##  Mean   :1.620e+09                      Mean   :  -0.755   Mean   :   737.6  
##  3rd Qu.:1.632e+09                      3rd Qu.:   0.000   3rd Qu.:   713.0  
##  Max.   :1.635e+09                      Max.   :   0.000   Max.   :132074.0  
##                                                                              
##       rights.bp           rights.elec        rights.download       rights.movie          rights.pay           rights.hd5        rights.no_reprint     rights.autoplay      rights.ugc_pay     rights.is_cooperation  rights.ugc_pay_preview  rights.no_background
##  Min.   :0            Min.   :0            Min.   :0            Min.   :0            Min.   :0.0000       Min.   :0.0000       Min.   :0.0000000    Min.   :0.0000       Min.   :0            Min.   :0.0000       Min.   :0            Min.   :0                
##  1st Qu.:0            1st Qu.:0            1st Qu.:0            1st Qu.:0            1st Qu.:0.0000       1st Qu.:0.0000       1st Qu.:1.0000000    1st Qu.:1.0000       1st Qu.:0            1st Qu.:0.0000       1st Qu.:0            1st Qu.:0                
##  Median :0            Median :0            Median :0            Median :0            Median :0.0000       Median :0.0000       Median :1.0000000    Median :1.0000       Median :0            Median :0.0000       Median :0            Median :0                
##  Mean   :0            Mean   :0            Mean   :0            Mean   :0            Mean   :0.0025       Mean   :0.4375       Mean   :0.8316667    Mean   :0.9925       Mean   :0            Mean   :0.0375       Mean   :0            Mean   :0                
##  3rd Qu.:0            3rd Qu.:0            3rd Qu.:0            3rd Qu.:0            3rd Qu.:0.0000       3rd Qu.:1.0000       3rd Qu.:1.0000000    3rd Qu.:1.0000       3rd Qu.:0            3rd Qu.:0.0000       3rd Qu.:0            3rd Qu.:0                
##  Max.   :0            Max.   :0            Max.   :0            Max.   :0            Max.   :1.0000       Max.   :1.0000       Max.   :1.0000000    Max.   :1.0000       Max.   :0            Max.   :1.0000       Max.   :0            Max.   :0                
##                                                                                                                                                                                                                                                                  
##       owner.mid             owner.name            owner.face     
##  Min.   :     28457    Length:1200           Length:1200         
##  1st Qu.:  23947287    Class :character      Class :character    
##  Median : 337521240    Mode  :character      Mode  :character    
##  Mean   : 467453220    NA                    NA                  
##  3rd Qu.: 544336675    NA                    NA                  
##  Max.   :2105467274    NA                    NA                  
##                                                                  
##       stat.aid             stat.view          stat.danmaku          stat.reply          stat.favorite          stat.coin           stat.share          stat.now_rank        stat.his_rank          stat.like          stat.dislike    
##  Min.   :  2599625    Min.   :     378     Min.   :     0.00    Min.   :    0.00     Min.   :     0.0     Min.   :     0.0     Min.   :     0.00    Min.   :0            Min.   :  0.0000     Min.   :      0.0    Min.   :0          
##  1st Qu.:336101652    1st Qu.:  142812     1st Qu.:   292.00    1st Qu.:  236.00     1st Qu.:   626.2     1st Qu.:   473.8     1st Qu.:    97.75    1st Qu.:0            1st Qu.:  0.0000     1st Qu.:   5843.8    1st Qu.:0          
##  Median :587790294    Median :  496958     Median :  1145.00    Median :  691.00     Median :  2543.0     Median :  2616.0     Median :   598.00    Median :0            Median :  0.0000     Median :  19623.0    Median :0          
##  Mean   :560113126    Mean   : 1237879     Mean   :  4531.27    Mean   : 1697.03     Mean   : 12988.9     Mean   : 22294.0     Mean   :  4892.27    Mean   :0            Mean   :  6.7825     Mean   :  68178.2    Mean   :0          
##  3rd Qu.:763318868    3rd Qu.: 1451116     3rd Qu.:  4072.50    3rd Qu.: 1827.50     3rd Qu.:  8584.0     3rd Qu.: 13353.0     3rd Qu.:  2610.75    3rd Qu.:0            3rd Qu.:  0.0000     3rd Qu.:  71615.8    3rd Qu.:0          
##  Max.   :976237705    Max.   :26865323     Max.   :175789.00    Max.   :37253.00     Max.   :730496.0     Max.   :774376.0     Max.   :279676.00    Max.   :0            Max.   :819.0000     Max.   :1276872.0    Max.   :0          
##                                                                                                                                                                                                                                       
##    dynamic               cid           
##  Length:1200        Min.   :  4062651  
##  Class :character   1st Qu.:300695667  
##  Mode  :character   Median :401334872  
##                     Mean   :348783941  
##                     3rd Qu.:412164764  
##                     Max.   :428187561  
##                                        
##    dimension.width     dimension.height     dimension.rotate  
##  Min.   : 318.000     Min.   : 240.000     Min.   :0.0000000  
##  1st Qu.:1280.000     1st Qu.:1080.000     1st Qu.:0.0000000  
##  Median :1920.000     Median :1080.000     Median :0.0000000  
##  Mean   :1839.597     Mean   :1231.498     Mean   :0.0016667  
##  3rd Qu.:1920.000     3rd Qu.:1080.000     3rd Qu.:0.0000000  
##  Max.   :4096.000     Max.   :4320.000     Max.   :1.0000000  
##                                                               
##  short_link_v2        up_from_v2     favorite            type       
##  Length:1200        Min.   : 1.00   Mode :logical   Min.   : 3.000  
##  Class :character   1st Qu.: 8.00   FALSE:1167      1st Qu.: 3.000  
##  Mode  :character   Median : 9.00   TRUE :33        Median : 3.000  
##                     Mean   :15.96                   Mean   : 3.012  
##                     3rd Qu.:20.00                   3rd Qu.: 3.000  
##                     Max.   :36.00                   Max.   :10.000  
##                     NA's   :1000                                    
##     sub_type           device     
##  Min.   :0.00000   Min.   :1.000  
##  1st Qu.:0.00000   1st Qu.:1.000  
##  Median :0.00000   Median :1.000  
##  Mean   :0.01833   Mean   :2.118  
##  3rd Qu.:0.00000   3rd Qu.:4.000  
##  Max.   :7.00000   Max.   :4.000  
##                                   
##                           page.cid                                                   page.page                                                   page.from                                                   page.part                                                 page.duration                                                  page.vid                                                  page.weblink                         page.dimension.width     dimension.height    dimension.rotate 
##  Min.   :  4062651                                           Min.   : 1.000000                                           Length:1200                                                 Length:1200                                                 Min.   :    7.00                                            Length:1200                                                 Length:1200                                                 Min.   : 318.000    Min.   : 240.000    Min.   :0.000000      
##  1st Qu.:300266664                                           1st Qu.: 1.000000                                           Class :character                                            Class :character                                            1st Qu.:  119.00                                            Class :character                                            Class :character                                            1st Qu.:1280.000    1st Qu.:1080.000    1st Qu.:0.000000      
##  Median :401174042                                           Median : 1.000000                                           Mode  :character                                            Mode  :character                                            Median :  340.00                                            Mode  :character                                            Mode  :character                                            Median :1920.000    Median :1080.000    Median :0.000000      
##  Mean   :348207322                                           Mean   : 1.026072                                           NA                                                          NA                                                          Mean   :  487.39                                            NA                                                          NA                                                          Mean   :1838.506    Mean   :1229.653    Mean   :0.001682      
##  3rd Qu.:412109363                                           3rd Qu.: 1.000000                                           NA                                                          NA                                                          3rd Qu.:  694.00                                            NA                                                          NA                                                          3rd Qu.:1920.000    3rd Qu.:1080.000    3rd Qu.:0.000000      
##  Max.   :428187561                                           Max.   :17.000000                                           NA                                                          NA                                                          Max.   :10943.00                                            NA                                                          NA                                                          Max.   :4096.000    Max.   :4320.000    Max.   :1.000000      
##  NA's   :11                                                  NA's   :11                                                  NA                                                          NA                                                          NA's   :11                                                  NA                                                          NA                                                          NA's   :11          NA's   :11          NA's   :11            
##      count           progress         view_at               kid           
##  Min.   : 1.000   Min.   :  -1.0   Min.   :1.630e+09   Min.   :    27040  
##  1st Qu.: 1.000   1st Qu.:  -1.0   1st Qu.:1.632e+09   1st Qu.:335887840  
##  Median : 1.000   Median :   1.0   Median :1.632e+09   Median :587030088  
##  Mean   : 1.201   Mean   : 122.3   Mean   :1.633e+09   Mean   :556435075  
##  3rd Qu.: 1.000   3rd Qu.: 101.2   3rd Qu.:1.633e+09   3rd Qu.:763218471  
##  Max.   :49.000   Max.   :2334.0   Max.   :1.635e+09   Max.   :976237705  
##  NA's   :8                                                                
##    business         redirect_link          bvid             mission_id    
##  Length:1200        Length:1200        Length:1200        Min.   : 10923  
##  Class :character   Class :character   Class :character   1st Qu.: 28604  
##  Mode  :character   Mode  :character   Mode  :character   Median : 84241  
##                                                           Mean   : 82051  
##                                                           3rd Qu.:122069  
##                                                           Max.   :208463  
##                                                           NA's   :542     
##    season_id     redirect_url      
##  Min.   :  107   Length:1200       
##  1st Qu.:  562   Class :character  
##  Median : 3491   Mode  :character  
##  Mean   :10602                     
##  3rd Qu.:24327                     
##  Max.   :32364                     
##  NA's   :1083                      
##                                                                          bangumi.ep_id                                                                                                                                                   bangumi.title                                                                                                                                                 bangumi.long_title                                                                                                                                            bangumi.episode_status                                                                                                                                              bangumi.follow                                                                                                                                                  bangumi.cover                                                                           bangumi.season.season_id      season.title     season.season_status   season.is_finish   season.total_count  season.newest_ep_id  season.newest_ep_index  season.season_type
##  Min.   :330566.0                                                                                                                                                Length:1200                                                                                                                                                     Length:1200                                                                                                                                                     Min.   : 2.0                                                                                                                                                    Min.   :0                                                                                                                                                       Length:1200                                                                                                                                                     Min.   :27040       Length:1200         Min.   : 2.0000     Min.   :0.0000      Min.   :-1.0000     Min.   :330566.0    Length:1200         Min.   :2.0000                  
##  1st Qu.:339391.0                                                                                                                                                Class :character                                                                                                                                                Class :character                                                                                                                                                1st Qu.: 2.0                                                                                                                                                    1st Qu.:0                                                                                                                                                       Class :character                                                                                                                                                1st Qu.:31395       Class :character    1st Qu.: 2.0000     1st Qu.:0.5000      1st Qu.:-1.0000     1st Qu.:359631.0    Class :character    1st Qu.:2.0000                  
##  Median :384953.0                                                                                                                                                Mode  :character                                                                                                                                                Mode  :character                                                                                                                                                Median : 2.0                                                                                                                                                    Median :0                                                                                                                                                       Mode  :character                                                                                                                                                Median :34041       Mode  :character    Median : 8.0000     Median :1.0000      Median : 1.0000     Median :416137.0    Mode  :character    Median :3.0000                  
##  Mean   :378549.3                                                                                                                                                NA                                                                                                                                                              NA                                                                                                                                                              Mean   : 6.0                                                                                                                                                    Mean   :0                                                                                                                                                       NA                                                                                                                                                              Mean   :34280       NA                  Mean   : 7.5714     Mean   :0.7143      Mean   : 0.7143     Mean   :391498.3    NA                  Mean   :3.1429                  
##  3rd Qu.:416009.0                                                                                                                                                NA                                                                                                                                                              NA                                                                                                                                                              3rd Qu.:10.5                                                                                                                                                    3rd Qu.:0                                                                                                                                                       NA                                                                                                                                                              3rd Qu.:38450       NA                  3rd Qu.:13.0000     3rd Qu.:1.0000      3rd Qu.: 1.0000     3rd Qu.:424143.0    NA                  3rd Qu.:3.0000                  
##  Max.   :423526.0                                                                                                                                                NA                                                                                                                                                              NA                                                                                                                                                              Max.   :13.0                                                                                                                                                    Max.   :0                                                                                                                                                       NA                                                                                                                                                              Max.   :39189       NA                  Max.   :13.0000     Max.   :1.0000      Max.   : 5.0000     Max.   :426237.0    NA                  Max.   :7.0000                  
##  NA's   :1193                                                                                                                                                    NA                                                                                                                                                              NA                                                                                                                                                              NA's   :1193                                                                                                                                                    NA's   :1193                                                                                                                                                    NA                                                                                                                                                              NA's   :1193        NA                  NA's   :1193        NA's   :1193        NA's   :1193        NA's   :1193        NA                  NA's   :1193                    
##   cheese.season_id     cheese.number     cheese.long_title      cheese.cover     cheese.update_info
##  Min.   :359         Length:1200         Length:1200         Length:1200         Length:1200       
##  1st Qu.:359         Class :character    Class :character    Class :character    Class :character  
##  Median :359         Mode  :character    Mode  :character    Mode  :character    Mode  :character  
##  Mean   :359         NA                  NA                  NA                  NA                
##  3rd Qu.:359         NA                  NA                  NA                  NA                
##  Max.   :359         NA                  NA                  NA                  NA                
##  NA's   :1199        NA                  NA                  NA                  NA

对于数据的每一列的含义,项目中获取全部视频历史记录(旧)均有说明。不过我们首先要弄明白,我们的观看记录最早记录到什么时候?

3 数据整理

根据此前的 summary() 以及获取全部视频历史记录(旧)中的说明。duration 键值为视频长度,progress 为视频播放进度,对于完播视频其键值为 -1。为了便于计算播放时长,我们将 durationprogress 结合输出为 play_time 以计算播放时间。

记录观看时间的 view_at 键值为 1634781264 这样的形式,根据经验此处应为 Unix 时间戳,使用 as.POSIXct 转换为 date/time 格式。之后按照每天中的时间以及星期将观看时间进行归类。同样的方法来处理 pubdate

代表分区大类的 tid 键值类型为数值型,然而根据其实际意义,应与 tname 结合使用,通过 forcats::fct_reorder 根据 tidtname 进行排序。

library("lubridate")
library("forcats")

histroy_tidy_tb <-
  history_tb %>%
  mutate(
    tname = fct_reorder(tname, tid),
    play_time = 
      if_else(progress<0, duration, progress),
    pubdate = 
      as.POSIXct(pubdate, origin = "1970-01-01"),
    view_at =
      as.POSIXct(view_at, origin = "1970-01-01"),
    date = date(view_at),
    time = round(local_time(view_at, units = "hours")),
    dow = wday(view_at, week_start = 1)
  )

4 数据可视化

4.1 我到底看了多久的视频?

首先我们回答第一个问题,这段时间我到底看了多久的 B 站视频?

total_sec <- histroy_tidy_tb %>% 
  summarise(
    min = min(view_at),
    max = max(view_at),
    total = sum(play_time))

total_sec
## # A tibble: 1 × 3
##   min                 max                  total
##   <dttm>              <dttm>               <int>
## 1 2021-09-01 19:43:49 2021-10-21 09:54:24 331559

从 2021-09-01 19:43:49 到 2021-10-21 09:54:24 总共看视频 331559 秒!也就是92.1 小时!妈见打系列了属于是。

4.2 什么时候才会看 B 站?

之后,我们开始探究新的问题,我都是在什么时候看的 B 站视频?我们分别对日期、一日中的时间、一周中的每天进行了可视化分析。

library("ggplot2")
library("cowplot")
## 
## Attaching package: 'cowplot'
## The following object is masked from 'package:lubridate':
## 
##     stamp
view_date_p <-
  histroy_tidy_tb %>%
  group_by(date) %>%
  summarise(duration_sum = sum(play_time, na.rm = TRUE)/3600) %>%
  ggplot(aes(x = date, y = duration_sum)) +
  geom_line() +
  scale_x_date(
    "",
    date_breaks = "7 day") +
  ylab("Total Play Duration\n(Hour)")

view_time_p <-
  histroy_tidy_tb %>%
  group_by(time) %>%
  summarise(duration_mean = mean(play_time, na.rm = TRUE)/3600) %>%
  ggplot(aes(x = time, y = duration_mean)) +
  geom_col() +
  scale_x_continuous(
    "Time of Day",
    limits = c(0, 24)) +
  ylab("Mean Play Duration\n(Hour/day)")

view_dow_p <-
  histroy_tidy_tb %>%
  group_by(dow) %>%
  summarise(duration_mean = mean(play_time, na.rm = TRUE)/3600) %>%
  ggplot(aes(x = dow, y = duration_mean)) +
  geom_col() +
  scale_x_continuous("Day of Week") +
  ylab("Mean Play Duration\n(Hour/day)")

view_bottom_grid_p <-
  plot_grid(view_time_p,
            view_dow_p,
            labels = c("B", "C"))

view_title_p <-
  ggdraw() +
  draw_label(
    "Play Duration (Hour)", 
    fontface = "bold"
  ) +
  theme(
    plot.margin = margin(0, 0, 0, 0)
  )

view_grid_p <-
  plot_grid(
    view_date_p,
    view_bottom_grid_p,
    view_title_p,
    labels = c("A", "", ""),
    rel_heights = c(1, 1, .1),
    nrow = 3)

view_grid_p

从可视化结果来看9月23号到10月2日我看了比往常更多个B站频。此外周一周二我刷视频时间似乎更久,然而最有趣的是,我到底是个怎样的夜猫子哇,居然凌晨也不休息????注意身体哇少年中年。

4.3 我看了哪类视频?

既然花费了那么久看视频,那么我到底看了什么视频呢?

library(forcats)
library(showtext)
## Loading required package: sysfonts
## Loading required package: showtextdb
showtext_auto()

histroy_tidy_tb %>% 
  select(
    tname,
    play_time
  ) %>% 
  group_by(tname) %>% 
  summarise(duration_sum = sum(play_time, na.rm = TRUE)/3600) %>% 
  mutate(tname = fct_reorder(
    tname, duration_sum
  )) %>% 
  ggplot(aes(x = tname,
             y = duration_sum)) +
  geom_col() +
  coord_flip() +
  labs(x = "播放时长 (小时)",
       y = "子分类") +
  theme(text = element_text(family = "source-han-sans-cn"))

再来看看不同时间看视频类型有没有什么差别。按照 timetname 分类,观察每天不同类型视频的时常。

showtext_auto()

histroy_tidy_tb %>% 
  group_by(time, tname) %>%
  summarise(duration_mean_by_type = mean(play_time, na.rm = TRUE)/3600)  %>%
  select(
    tname, time, duration_mean_by_type
  ) %>% 
  ggplot(aes(x = time,
             y = duration_mean_by_type,
             fill = tname
             )) +
  geom_col()

然而因为分类过于丰富了,反而看不出规律了。为了便于数据可视化,我们这里尝试将播放时长较短的类型合并,将类别播放总时间低于整体播放总时间 1% 的视频分类归为其它。

library("colorspace")

history_type_aggregate_tb <-
  histroy_tidy_tb %>%
  select(tname,
         play_time) %>%
  group_by(tname) %>%
  summarise(duration_sum = sum(play_time, na.rm = TRUE) / 3600) %>%
  mutate(percentage = duration_sum / sum(duration_sum),
         tname = as.character(tname)) %>%
  mutate(type = if_else(percentage >= .01, tname, 'other')) %>%
  group_by(type) %>%
  summarise(duration_sum = sum(duration_sum)) %>% 
  arrange(desc(duration_sum))

DT::datatable(history_type_aggregate_tb)
showtext_auto()

histroy_tidy_tb %>% 
  mutate(
    tname = as.character(tname),
    type = if_else(tname %in% history_type_aggregate_tb$type, 
                   tname,
                   "other"
    )) %>% 
  group_by(time, type) %>%
  summarise(duration_mean_by_type = mean(play_time, na.rm = TRUE)/3600)  %>%
  select(
    type, time, duration_mean_by_type
  ) %>% 
  ggplot(aes(x = time,
             y = duration_mean_by_type,
             fill = type,
             label = type
  )) +
  geom_col() +
  labs(x = "时间",
       y = "播放时长\n(小时)",
       fill = "视频类别") +
  theme_classic() +
  scale_fill_discrete_sequential("Batlow")

看起来,我仍然是那个爱看别人打游戏的少年,一天中只要看 B 站,就会花时间看单机游戏。其次在凌晨和中午就比较喜欢看影视杂谈类的视频。最后到了半下午和晚上,就喜欢看美食类的节目……果然是个几百斤的孩子呢(摊手

文章至此,长度已经太长了,更多的分析,在接下来的文章中呈现,先把数据保存下来以后续使用。这里我们把 tibble 对象保存为 Parquet 文件,这是一种通用性较高的分列式文件格式,也是 Hadoop 生态中常用的文件存储格式。具体介绍见 apache/arrow。我们直接把数据保存在本地的 MinIO 数据库中。

library(minio.s3)
# bucket <-
#   minio.s3::put_bucket("bili-history")
s3save(histroy_tidy_tb, object = "histroy_tidy_tb.Rdata", bucket = 'bili-history')
get_bucket('bili-history')
## Bucket: bili-history 
## 
## $Contents
## Key:            histroy_tidy_tb.Rdata 
## LastModified:   2021-10-21T03:27:53.416Z 
## ETag:           "7a07c6f484da97bf7cac3fa97e898991" 
## Size (B):       347773 
## Owner:          minio 
## Storage class:  STANDARD

欢迎通过邮箱微博, Twitter以及知乎与我联系。也欢迎关注我的博客。如果能对我的 Github 感兴趣,就再欢迎不过啦!