Skip to content

Building a linear regression MCP server

Published:
Cover image for Building a linear regression MCP server

I wanted to learn more about MCP, so I’ve built an MCP server that adds re­gres­sion analy­sis ca­pa­bil­i­ties to AI mod­els. Here, we’ll dis­cuss what is MCP, why I chose lin­ear re­gres­sion, how to test an MCP server, and some lim­i­ta­tions I cur­rently see re­gard­ing the pro­to­col and its im­ple­men­ta­tions.

The repos­i­tory is here: mcp-​ols.

What is MCP?

Model con­text pro­to­col (MCP), ac­cord­ing to the of­fi­cial web­site, is:

[…] an open pro­to­col that stan­dard­izes how ap­pli­ca­tions pro­vide con­text to LLMs. Think of MCP like a USB-C port for AI ap­pli­ca­tions. Just as USB-C pro­vides a stan­dard­ized way to con­nect your de­vices to var­i­ous pe­riph­er­als and ac­ces­sories, MCP pro­vides a stan­dard­ized way to con­nect AI mod­els to dif­fer­ent data sources and tools.

It has been de­vel­oped by An­thropic with the goal of pro­vid­ing a stan­dard­ized in­ter­face for LLMs to in­ter­act with ap­pli­ca­tions. They’ve also pro­vided some ref­er­ence servers that show how MCP servers can add var­i­ous ca­pa­bil­i­ties to LLMs, such as fetch­ing web con­tent or in­ter­act­ing with the local file sys­tem.

The host (an LLM-​enabled ap­pli­ca­tion) starts an MCP client that main­tains an 1:1 con­nec­tion with the server. This con­nec­tion hap­pens through a trans­port layer, usu­ally stdio but the spec­i­fi­ca­tion also sup­ports stream­able HTTP trans­port. The stdio trans­port is suit­able for run­ning servers lo­cally in your ma­chine, and it’s the one we’ll use here. When run­ning local MCP servers with stdio trans­port, the host will usu­ally start the server process.

Lin­ear re­gres­sion is re­ally use­ful

Or­di­nary least squares (OLS) re­gres­sion is a sim­ple yet un­rea­son­ably ef­fec­tive sta­tis­ti­cal model. To put it sim­ply, it es­ti­mates the pa­ra­me­ter of a lin­ear equa­tion of the form:

Y=β0+β1Xi+β2X2+...+εY = \beta_0 + \beta_1 X_i + \beta_2 X_2 + ... + \varepsilon

It’s easy to com­pute and easy to in­ter­pret. Each co­ef­fi­cient tells you how much YY is ex­pected to change for each unit change in XiX_i, all else being equal.

You may think this for­mu­la­tion is too re­stric­tive, but bear with me. Let’s say you have blood pres­sure data for two groups (treated or not) in a clin­i­cal trial, and you need to know whether the treat­ment had any ef­fect. It’s sim­ple: run an OLS re­gres­sion of the form blood_pressure ~ treatment, where treatment is a bi­nary in­di­ca­tor. This for­mula with ~ is a no­ta­tion style from R that is very use­ful to spec­ify a lin­ear re­gres­sion model. It’ll be very use­ful here as an LLM-​friendly way of cre­at­ing re­gres­sion mod­els.

This ex­am­ple can be ex­tended with mul­ti­ple groups by adding dummy bi­nary vari­ables for each group value. Luck­ily, we don’t need to do this, as the for­mula no­ta­tion al­lows us to spec­ify cat­e­gor­i­cal vari­ables:

revenue ~ C(store) + temperature + C(dayofweek) + advertising_cost

meme with one astronaut asking "wait, it's all OLS?" while another points a gun to the first saying "always has been"

The same OLS re­gres­sion can be used to per­form an­a­lyzes equiv­a­lent to other sta­tis­ti­cal meth­ods such as ANOVA, cor­re­la­tion, and t-​tests. It’s widely used in many areas of sci­en­tific re­search, fi­nance, mar­ket­ing, econo­met­rics, among oth­ers. When you look closely, OLS is truly every­where, mainly due to its in­ter­pretabil­ity and solid sta­tis­ti­cal foun­da­tion. Sim­ple lin­ear mod­els often out­per­form ex­perts de­spite their lim­i­ta­tions, which is quite re­mark­able.

Adding sta­tis­ti­cal ca­pa­bil­i­ties to LLMs

You can add an en­tire dataset (e.g. a CSV file) to a chat con­text and have the LLM an­a­lyze it, but ex­tract­ing mean­ing­ful con­clu­sions from raw CSV to­kens is no easy feat. A com­mon ap­proach is to use the LLM to write code to an­a­lyze the data. This MCP server aims to be a mid­dle ground, pro­vid­ing an in­ter­face to quickly an­a­lyze data using OLS — with­out adding the dataset to the chat con­text. It’s safer than al­low­ing ar­bi­trary code ex­e­cu­tion, but the real mo­ti­va­tion is that I wanted to im­ple­ment an MCP server.

The project was writ­ten in Python. Here is the repo. The of­fi­cial Python SDK pro­vides some straight­for­ward ways to build an MCP server, es­pe­cially the FastMCP in­ter­face — not to be con­fused with the FastMCP li­brary. Ver­sion 1.0 of FastMCP was in­cor­po­rated into the mcp SDK, but since then both projects have ap­par­ently di­verged a bit. Still, both are pretty sim­i­lar.

We can de­fine a bare-​bones lin­ear re­gres­sion tool like this:

import numpy as np
import pandas as pd
from mcp.server.fastmcp import FastMCP
from statsmodels.api import formula as smf

mcp = FastMCP("linear-regression")

cache = []


@mcp.tool()
def run_ols_regression(formula: str):
    """Run a linear regression based on a patsy formula

    Args:
        formula: string of format Y ~ X_1 + X_2 + ... + X_n
    """
    model = smf.ols(formula, cache[0]).fit()
    return model.summary().as_html()

The @mcp.tool dec­o­ra­tor reg­is­ters the func­tion as a tool, ex­tract­ing ar­gu­ments and doc­string. We still have to read data into the “cache” list, though. The quick and dirty so­lu­tion was to cre­ate a ses­sion class to keep data and mod­els:

class DataAnalysisSession:
    def __init__(self):
        self.sessions: dict[str, dict[str, Any]] = {}

    def create_session(self) -> str:
        session_id = str(uuid.uuid4())
        self.sessions[session_id] = {
            "data": None,
            "metadata": {},
            "models": {},
            "created_at": datetime.now(),
        }
        return session_id

    def get_session(self, session_id: str) -> dict[str, Any]:
        if session_id not in self.sessions:
            raise ValueError(f"Session {session_id} not found")
        return self.sessions[session_id]

_session = DataAnalysisSession()

Hon­estly, I find this a bit too ugly, as I dis­like module-​level global state. How­ever, since each local MCP server ses­sion is started as a new process, this is not a big issue. In the case of re­mote MCP servers through stream­able HTTP, this so­lu­tion wouldn’t be ap­pro­pri­ate.

Many MCP servers are state­less, which is ar­guably bet­ter con­sid­er­ing that LLMs only have ac­cess to con­text, which may be de­tached from the MCP server state. The MCP doc­u­men­ta­tion talks about re­play­ing steps when using stream­able HTTP mode, but this is be­yond the scope of this project. The FastMCP frame­work is very con­ve­nient, but in my opin­ion both frame­works still have some rough edges, es­pe­cially re­gard­ing test­ing.

The OLS part of the server is a sim­ple wrap­per around statsmod­els, so there isn’t much rel­e­vant unit test­ing to be done. I’ve then added some vi­su­al­iza­tion tools, such as a resid­ual plots tool, which re­turn PNG image data using the Image type avail­able both in mcp and FastMCP. The image data is re­turned as a base64 en­coded string to the client, and I’ve found that many MCP clients do not sup­port image out­puts yet. In these cases, the base64 string is added to the model con­text, which usu­ally leads to the LLM try­ing to repli­cate the image data string with­out suc­cess. Send­ing raw image data in­stead of the image it­self to mul­ti­modal mod­els also con­sumes much more to­kens, often ex­ceed­ing the max­i­mum con­text length.

In its cur­rent form, the server con­tains tools to load data (of course), de­scribe the data, run an OLS re­gres­sion, and run a lo­gis­tic re­gres­sion1. Then, a set of tools to in­ter­pret mod­els: plot resid­u­als, test model as­sump­tions, com­pute a VIF2 table, plot in­flu­ence di­ag­nos­tics, cre­ate par­tial de­pen­dence plots, list mod­els, com­pare mod­els, and plot model com­par­i­son. It should be able to read CSV, Excel, JSON, and par­quet files, as well as ta­bles from SQLite data­bases.

Using the MCP server

I rec­om­mend using Claude Desk­top, since it’s prob­a­bly the most pol­ished MCP client ex­pe­ri­ence. Un­for­tu­nately, it’s not avail­able on Linux yet, so I have not used it here. You may add the MCP server using a CLI/GUI of your choice (e.g. the claude CLI), or you may edit a JSON con­fig­u­ra­tion file di­rectly. Below is an ex­am­ple in the style of Claude Desk­top:

{
  "mcpServers": {
    "mcp-ols": {
        "command": "uvx",
        "args": ["mcp-ols"]
    }
}

You can copy the mcp-ols part and paste it in most ap­pli­ca­tions with MCP sup­port, in­clud­ing VS Code. We’re using uv to run the MCP, so you should in­stall it first.

Ac­ti­vate the MCP server in a new chat and then ask it to an­a­lyze some data file. I’ll use the Ad­ver­tis­ing dataset from chap­ter 2 of ISLR3, which can be found here. The LLM should then be able to use the tools to load, an­a­lyze the data, and run re­gres­sion mod­els.

Chat ex­am­ple

I’ve omit­ted tool calls and filler text.

Let’s an­a­lyze the data in ~/Documents/Advertising.csv

[…] Here’s a sum­mary of your Ad­ver­tis­ing dataset […]

Yes, let’s run a re­gres­sion on sales

Here are the re­sults of the lin­ear re­gres­sion pre­dict­ing Sales from ad­ver­tis­ing spend on TV, Radio, and News­pa­per:

R-​squared: 0.897 (89.7% of the vari­ance in Sales is ex­plained by the model) Ad­justed R-​squared: 0.896 F-​statistic: 570.3, p-​value: <0.001 (model is highly sig­nif­i­cant) Co­ef­fi­cients:

In­ter­cept: 2.94 TV: 0.0458 (highly sig­nif­i­cant, p < 0.001) Radio: 0.1885 (highly sig­nif­i­cant, p < 0.001) News­pa­per: -0.0010 (not sig­nif­i­cant, p = 0.86) In­ter­pre­ta­tion:

TV and Radio ad­ver­tis­ing both have a strong, sta­tis­ti­cally sig­nif­i­cant pos­i­tive ef­fect on Sales. News­pa­per ad­ver­tis­ing ap­pears to have no sig­nif­i­cant im­pact on Sales in this model.

Try some mod­els with in­ter­ac­tions and com­pare them

[…]

ModelFor­mulaR-​squaredAdj. R²AICSig­nif­i­cant in­ter­ac­tions?
1Sales ~ TV + Radio + News­pa­per0.8970.896780.4None
2Sales ~ TV + Radio + News­pa­per + TV:Radio0.9680.967550.1TV:Radio (very sig­nif­i­cant)
3Sales ~ TV + Radio + News­pa­per + TV:News­pa­per0.9030.901769.8TV:News­pa­per (sig­nif­i­cant)
4Sales ~ TV + Radio + News­pa­per + Radio:Newsp0.8970.895781.9Radio:News­pa­per (not sig­nif­i­cant)
5Sales ~ TV _ Radio _ News­pa­per (all inter.)0.9690.968550.9Only TV:Radio

Key find­ings:

  • Adding the TV:Radio in­ter­ac­tion (Model 2) gives a big boost to R² and AIC, in­di­cat­ing a much bet­ter fit. The in­ter­ac­tion is very sig­nif­i­cant (p < 0.001).
  • Adding TV:News­pa­per (Model 3) gives a small boost; the in­ter­ac­tion is sig­nif­i­cant, but the in­crease in fit is lim­ited.
  • Adding Radio:News­pa­per (Model 4) does not im­prove the model.
  • The full in­ter­ac­tion model (Model 5) has the high­est R² (mar­gin­ally above Model 2) and the low­est AIC, but only TV:Radio re­mains highly sig­nif­i­cant. The ad­di­tional in­ter­ac­tion terms are not sig­nif­i­cant.

Con­clu­sion:

  • The best bal­ance of model fit and sim­plic­ity is Model 2 (with the TV:Radio in­ter­ac­tion).
  • Model 5 is tech­ni­cally “best” on R²/AIC, but the extra in­ter­ac­tion terms do not con­tribute much and add un­nec­es­sary com­plex­ity.
  • Most im­por­tant fea­tures for pre­dict­ing Sales: TV, Radio, and their in­ter­ac­tion.

I’ve also asked it to plot resid­u­als and an­a­lyze the plots. If you use a mul­ti­modal model, it should be able to in­ter­pret the plots. The tool call re­turned the fol­low­ing image:

OLS residuals diagnostic plots

This is a sim­ple ex­am­ple, but you can use this MCP server to ex­per­i­ment with more com­plex datasets, com­pare mod­els, check as­sump­tions, plot par­tial re­gres­sions, and so on.

Test­ing

From what I’ve gath­ered, I see three major ways to write au­to­mated tests for MCP servers. The first is the usual unit test­ing. This project is a thin wrap­per around statsmod­els and mat­plotlib, so even though it would be pos­si­ble to write unit tests, they were not my focus. I think unit tests here are bet­ter suited for func­tions that tools call in more com­plex sce­nar­ios and for check­ing tool input val­i­da­tion.

The sec­ond way of test­ing is using tool calls di­rectly, with the server run­ning in-​memory. This is the ap­proach I’ve used here with the FastMCP li­brary. How­ever, this is prone to test pol­lu­tion — that is, a test may fail due to the side-​effects of other tests on the server state. The MCP server in­stance is a module-​level global vari­able, hence all tests share the same MCP server state. I’ve tried cre­at­ing the server in­stance and adding tools dy­nam­i­cally in a main func­tion, but it was very clunky and, more im­por­tantly, it broke the MCP in­spec­tor in­te­gra­tion.

The MCP in­spec­tor is an in­ter­ac­tive de­vel­oper tool for test­ing and de­bug­ging MCP servers through a web in­ter­face, which is quite use­ful. Alas, the tests share global state.

The third way of test­ing is end-​to-​end test­ing, in which you cre­ate in­stan­ti­ate a new client-​server pair for each test. For in­stance, if we wanted to list all avail­able tools, we could im­ple­ment some­thing like this:

async def main():
    server_params = StdioServerParameters(command="python", args=["mcp_ols.py"])

    async with stdio_client(server_params) as (read_stream, write_stream):
        async with ClientSession(read_stream, write_stream) as session:
            await session.initialize()

            tools_result = await session.list_tools()
            for tool in tools_result.tools:
                print("=" * 50)
                print(f"{tool.name}: {tool.description}")

This ex­am­ple, un­like the final code in the repos­i­tory, uses the mcp li­brary in­stead of FastMCP4. This idea can be used to write end-​to-​end tests, but each test will start a new process (using Python’s mul­ti­pro­cess­ing) for the MCP server in­stance, which is quite slow. In pro­duc­tion set­tings, I think tests like this would be a good call, but not for our small project.

Thoughts on MCP

There is a run­ning de­bate over MCP ver­sus CLI tools. The MCP pro­to­col pro­vides a stan­dard­ized in­ter­face to add tools to the con­text of AI mod­els, which can then de­cide to call these tools. The al­ter­na­tive is let­ting the AI model call CLI util­i­ties (e.g. git, jq) di­rectly through shell com­mands or cus­tom scripts.

Armin Ronacher, for in­stance, ar­gues that MCP re­lies too much on in­fer­ence and de­mands too much con­text. The idea is that using CLI tools or writ­ing scripts is more com­pos­able and, cru­cially, more reusable and ver­i­fi­able when au­tomat­ing tasks (I highly rec­om­mend read­ing his post, this is a very short sum­mary). Over­all, I think Armin has a valid point. When tool count is high, model per­for­mance on tasks that re­quire tools de­grades. Some­times, the model will sim­ply not use the cor­rect tools, and you need to nudge it to­wards the right di­rec­tion, which con­sumes more con­text. I also agree that you can go a long way with CLI tools (e.g. the GitHub CLI).

Nonethe­less, I see some lim­i­ta­tions in the CLI/script ap­proach. First, many clients do not have shell ac­cess to ex­e­cute com­mands or scripts, and ex­e­cut­ing ar­bi­trary scripts can be dan­ger­ous. Right now I think the over­lap be­tween tech-​savvy peo­ple and Claude Desk­top users (the most pop­u­lar MCP client) is very high, so most MCP users would likely be able to eval­u­ate the safety of shell com­mands and sim­ple scripts. If the idea of tool-​calling with LLMs be­comes more wide­spread, then this issue would be more rel­e­vant. Safety can be built into MCP servers, but they’re still processes run­ning with ar­bi­trary per­mis­sions — you have to trust the MCP server to have im­ple­mented all the nec­es­sary guardrails. Maybe we’ll fig­ure out how to sand­box these tools, or we’ll just keep ask­ing the user for con­fir­ma­tion on each tool call.

When the task re­quires long-​running state, MCP also has an ad­van­tage. In mcp-ols, the CLI ap­proach would have to re­load the data at every call (but it would still work). All tools in our server can be im­ple­mented through sim­ple scripts, and in some way it is a col­lec­tion of scripts trans­formed into a server. Right now, I wouldn’t use this sim­ple server to au­to­mate tasks at scale. I see it more as a pre-​made col­lec­tion of func­tions that per­form one task well enough to be reused in local-​first set­tings. You can quickly ask the AI model to per­form re­gres­sion analy­sis be­cause the code has al­ready been writ­ten and en­cap­su­lated into the MCP server.

There are still some rough edges on MCP servers and clients, es­pe­cially clients. I’ve tested var­i­ous clients and they had very dif­fer­ent be­hav­iors. Some would not in­clude the out­put of tool calls in the chat con­text (they in­cluded only the re­marks of the LLM re­gard­ing the out­put). This ren­ders our server un­us­able. Most did not han­dle im­ages re­turned by the server well, adding them as raw base64 strings to the con­text. This, of course, con­sumes a lot of to­kens and pro­vides no in­for­ma­tion to the LLM. Since it’s still a new stan­dard, this is ex­pected. There is a huge num­ber of AI mod­els and providers and not all are mul­ti­modal. For what I’ve seen, mul­ti­modal­ity is also a sharp edge for now.

Con­clu­sions

Build­ing this sim­ple MCP server was very in­ter­est­ing. I thought it would be a rather bor­ing project, but be­sides learn­ing about MCP it made me re­flect about LLM tool-​calling, how to test such tools, and the safety of ar­bi­trary tool call­ing. It was also an ex­cuse for me to try many in­ter­est­ing open-​source MCP clients. Un­for­tu­nately, Claude Desk­top is still not avail­able on Linux, but I hope this changes soon. I’ll con­tinue to try out some MCP servers and see how they can im­prove my work­flows.

Footnotes

  1. Lo­gis­tic re­gres­sion is not an OLS model, but it’s com­mon enough to be in­cluded here.

  2. Vari­ance in­fla­tion fac­tor (VIF) is used as an in­di­ca­tor of mul­ti­collinear­ity.

  3. An In­tro­duc­tion To Sta­tis­ti­cal Learn­ing — James et al.

  4. I was try­ing out both li­braries, so at each point I was using one. It’s part of work­ing on new things, I guess.



Next Post
llm-docsmith