# GTFS Tables

## GTFS Table Relationships

Source: Wikimedia, user -stk.

In addition to the tables described above, trread attempts to calculate the following tables when one uses read_gtfs():

• routes_frequencies_df
• stops_frequencies_df

trread prints a message regarding these tables on reading any GTFS file.

# Read in GTFS feed
# here we use a feed included in the package, but note that you can read directly from the New York City Metropolitan Transit Authority using the following URL:
local_gtfs_path <- system.file("extdata",
local=TRUE,
frequency=TRUE)
#> Calculating route and stop headways.

## Example GTFS Table Joins

### Route Frequencies to Routes

For example, joining the standard routes table, with the ‘route_shortname’ variable to routes_frequencies_df.

routes_df_frequencies <- nyc$routes_df %>% inner_join(nyc$routes_frequency_df, by = "route_id") %>%
select(route_long_name,
stop_count)
#> # A tibble: 6 x 5
#>   <chr>                      <int>         <int>           <dbl>      <int>
#> 1 Broadway - 7 Av…               5             5            0.15         76
#> 2 7 Avenue Express               7            51          135.          120
#> 3 7 Avenue Express               8             8            0.08         68
#> 4 Lexington Avenu…               6           115          205.           77
#> 5 Lexington Avenu…               9           110          271.          102
#> 6 Lexington Avenu…              48            48            0            29

### Headways at Stops for a Route

A more complex example of cross-table joins is to pull the stops and their headways for a given route.

This simple question is a great way to begin to understand a lot about the GTFS data model.

First, we’ll need to find a ‘service_id’, which will tell us which stops a route passes through on a given day of the week and year.

When calculating frequencies, trread tries to guess which service_id is representative of a standard weekday by walking through a set of steps. Below we’ll just do some of this manually.

First, lets look at the calendar_df.

head(sample_n(nyc$calendar_df,10)) #> # A tibble: 6 x 10 #> service_id monday tuesday wednesday thursday friday saturday sunday #> <chr> <int> <int> <int> <int> <int> <int> <int> #> 1 BSP18GEN-… 1 1 1 1 1 0 0 #> 2 BSP18GEN-… 0 0 0 0 0 1 0 #> 3 BSP18GEN-… 1 1 1 1 1 0 0 #> 4 BSP18GEN-… 1 1 1 1 1 0 0 #> 5 SIR-SP201… 0 0 0 0 0 0 1 #> 6 BSP18GEN-… 0 0 0 0 0 1 0 #> # … with 2 more variables: start_date <date>, end_date <date> Then we’ll pull a random route_id and set of service_ids that run on Mondays. select_service_id <- filter(nyc$calendar_df,monday==1) %>% pull(service_id)
select_route_id <- sample_n(nyc$routes_df,1) %>% pull(route_id) Now we’ll filter down through the data model to just stops for that route and service_ids. some_trips <- nyc$trips_df %>%
filter(route_id %in% select_route_id & service_id %in% select_service_id)

some_stop_times <- nyc$stop_times_df %>% filter(trip_id %in% some_trips$trip_id)

some_stops <- nyc$stops_df %>% filter(stop_id %in% some_stop_times$stop_id)