Summarising Top 100 UK Climbs: Running Local Language Models with LM Studio and R

Install LM Studio and download models

Before we begin, ensure you have the following installed:

After you have downloaded and installed LM Studio, open the
application. Go to the Discover tab (sidebar), where
you can browse and search for models. In this example, we will be using
the Phi-3-mini-4k-instruct
model, but you can of course experiment with any other model that you
prefer – as long as you’ve got the hardware to run it!

Now, select the model from the top bar to load it:

To check that everything is working fine, go to the
Chat tab on the sidebar and start a new chat to
interact with the Phi-3 model directly. You’ve now got your language
model up and running!

Required R Packages

To effectively work with LM Studio, we will need several R
packages:

tidyverse – for data manipulation
httr – for API interaction
jsonlite – for JSON parsing

You can install/update them all with one line of code:

# Install necessary packages
install.packages(c("tidyverse", "httr", "jsonlite"))

Let us set up the R script by loading the packages and the data we
will be working with:

# Load the packages
library(tidyverse)
library(httr)
library(jsonlite)

top_100_climbs_df <- read_csv("https://raw.githubusercontent.com/martinctc/blog/refs/heads/master/datasets/top_100_climbs.csv")

The top_100_climbs_df dataset contains information on
the top 100 cycling climbs in the UK, which I’ve pulled from the Cycling Uphill website,
originally put together by Simon Warren. These
are 100 rows, and the following columns in the dataset:

climb_id: row unique identifier for the climb
climb: name of the climb
height_gain_m: height gain in meters
average_gradient: average gradient of the climb
length_km: total length of the climb in kilometers
max_gradient: maximum gradient of the climb
url: URL to the climb’s page on Cycling Uphill

Here is what the dataset looks like when we run
dplyr::glimpse():

glimpse(top_100_climbs_df)
## Rows: 100
## Columns: 7
## $ climb_id          1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16…
## $ climb             "Cheddar Gorge", "Weston Hill", "Crowcombe Combe", "P…
## $ height_gain_m     150, 165, 188, 372, 326, 406, 166, 125, 335, 163, 346…
## $ average_gradient  0.05, 0.09, 0.15, 0.12, 0.10, 0.04, 0.11, 0.11, 0.06,…
## $ length_km         3.5, 1.8, 1.2, 4.9, 3.2, 11.0, 1.5, 1.1, 5.4, 1.4, 9.…
## $ max_gradient      0.16, 0.18, 0.25, 0.25, 0.17, 0.12, 0.25, 0.18, 0.12,…
## $ url               "https://cyclinguphill.com/cheddar-gorge/", "https://…

Our goal here is to use this dataset to generate text descriptions
for each of the climbs using the language model. Since this is for text
generation, we will do a bit of cleaning up of the dataset, converting
gradient values to percentages:

top_100_climbs_df_clean <- top_100_climbs_df %>%
  mutate(
    average_gradient = scales::percent(average_gradient),
    max_gradient = scales::percent(max_gradient)
    )

Source link