Why Your Legal AI Should Never See Your Client Data

How Venice AI's private inference and per-user isolated hardware keep attorney-client privilege intact.

recess.legal··7 min read
Why Your Legal AI Should Never See Your Client Data

The Question Every Attorney Should Ask Their AI Vendor

Before you upload a single page of medical records, deposition transcripts, or client communications to any AI platform, ask this question: "Where does my data go after you process it?"

If the vendor pauses, hedges, or points you to a 40-page privacy policy — that is your answer.

Most legal AI tools run on shared cloud infrastructure. Your client's records sit on the same servers, processed by the same model instances, that handle every other customer's data. The provider may promise they do not "train on your data," but the architecture itself creates risks that no privacy policy can eliminate.

When a shared model processes your client's traumatic brain injury records at 2 PM and another firm's trade secret case at 2:01 PM, the infrastructure does not know or care about attorney-client privilege. It is simply processing tokens.

That is not a theoretical risk. It is an architectural decision that most vendors made because shared infrastructure is cheaper.

What Private Inference Actually Means

Private inference is a fundamentally different approach to running AI models. Instead of routing your data through shared infrastructure, the model runs on isolated hardware dedicated to your request. No other customer's data touches the same compute resources while your query is being processed.

Venice AI — the inference provider that powers recess.legal — implements private inference at the hardware level. Here is what that means in practice:

Zero data retention. Your prompt and the model's response are not stored after the request completes. There is no log of your query, no cache of the response, no training dataset being assembled in the background. The data exists only for the duration of the inference call, then it is gone.

No model training on customer data. Open-source models like MiniMax M2.5 are pre-trained on public datasets. Your legal documents are never used to fine-tune, retrain, or improve the model. The model that processes your data today is the same model that processes everyone's data — but it never learns from yours.

Isolated compute per request. Each inference request runs on dedicated hardware resources. There is no multi-tenant sharing of GPU memory, no risk of data leakage between concurrent requests from different customers.

This is not a marketing claim. It is an architectural constraint enforced by the infrastructure itself.

Recess.legal uses MiniMax M2.5, an open-source large language model, through Venice AI's private inference infrastructure. The choice of an open-source model is not incidental — it is a deliberate privacy decision.

Auditable weights. The model's weights are publicly available. Independent researchers can inspect what the model knows and how it behaves. There is no black box.

No phone-home behavior. Proprietary models from OpenAI, Anthropic, and Google run on vendor-controlled infrastructure. Even when vendors offer "enterprise" tiers with better privacy terms, your data still flows through their systems, their logging infrastructure, and their operational processes. Open-source models running on third-party private inference providers create a cleaner separation.

No terms-of-service surprises. Proprietary model vendors update their terms regularly. A clause that allows "service improvement" or "safety research" using customer inputs can change the privacy calculus overnight. Open-source model licenses do not have this problem because the model itself is not a service — it is software.

Competitive inference market. Because the model is open-source, we are not locked into a single provider. If Venice AI changed their privacy posture tomorrow, we could move to another private inference provider running the same model. That optionality protects your data long-term.

How Recess.legal Implements Per-User Isolation

Private inference at the model layer is necessary but not sufficient. The application layer — the software that manages your cases, stores your documents, and orchestrates AI queries — also needs isolation. Here is how recess.legal handles it:

Isolated Agent Containers

Every user gets their own AI agent running in a dedicated Docker container. Your agent does not share memory, processes, or runtime state with any other user's agent. The containers run with all Linux capabilities dropped and no privilege escalation permitted. If one container were somehow compromised, it cannot affect any other user's data or agent.

Encryption at Rest

Documents stored in the system are encrypted at rest. OAuth tokens, API credentials, and agent gateway tokens are encrypted using Fernet symmetric encryption before they touch the database. The encryption keys are managed separately from the application data.

Append-Only Audit Trail

Every significant action in the system — document uploads, AI queries, case modifications — is logged to an append-only audit table enforced by PostgreSQL triggers. The audit trail cannot be modified or deleted, even by administrators. Protected health information in audit logs is hashed using HMAC, so the audit trail is useful for compliance without exposing sensitive data.

Network Isolation

The infrastructure uses split networks. Frontend services (the web application you interact with) and backend services (databases, AI agents, document processing) run on separate Docker networks. The API service bridges both networks, but direct access from the frontend network to backend services is not possible.

ABA Model Rule 1.6 and the Duty of Competence

The American Bar Association's Model Rule 1.6(c) requires attorneys to "make reasonable efforts to prevent the inadvertent or unauthorized disclosure of, or unauthorized access to, information relating to the representation of a client."

Comment [18] to Rule 1.6 specifically addresses technology:

Factors to be considered in determining the reasonableness of the lawyer's efforts include... the sensitivity of the information, the likelihood of disclosure if additional safeguards are not employed, the cost of employing additional safeguards, the difficulty of implementing the safeguards, and the extent to which the safeguards adversely affect the lawyer's ability to represent clients.

Running client data through shared AI infrastructure — where the vendor retains logs, trains on inputs, or processes your data alongside competitors' data — is increasingly difficult to justify as "reasonable efforts" when private alternatives exist at comparable price points.

The analysis is straightforward:

  • Sensitivity of the information: Medical records, litigation strategy, and client communications are among the most sensitive categories of data an attorney handles.
  • Likelihood of disclosure without safeguards: Shared infrastructure creates surface area for data leakage through model training, logging, caching, and multi-tenant compute.
  • Cost of additional safeguards: Private inference is available at commodity pricing. The cost delta between shared and private inference is measured in fractions of a cent per query.
  • Difficulty of implementation: Zero. The attorney does not need to configure anything differently. The privacy architecture is built into the platform.

When opposing counsel asks how you handled your client's data, "we used the cheapest shared AI we could find" is not the answer you want to give.

What Attorneys Should Ask Any AI Vendor

Before adopting any AI tool for legal work, ask these five questions:

  1. Does your model train on my data? If yes, or if the answer is qualified ("not currently" or "only for safety"), walk away.

  2. Where does inference happen? Shared cloud GPU clusters are the norm. Dedicated or isolated compute is the standard you should demand.

  3. What happens to my data after processing? Zero retention is the correct answer. Any form of logging, caching, or storage of inputs and outputs introduces risk.

  4. Is the model open-source or proprietary? Open-source models are independently auditable. Proprietary models require trust in the vendor's claims.

  5. Can I audit the architecture? Any vendor that cannot explain their data flow, isolation model, and encryption approach in technical detail is asking you to trust them on faith. Faith is not a security posture.

The Bottom Line

Attorney-client privilege is not a feature you bolt on after the fact. It is either built into the architecture from the ground up, or it is not.

Recess.legal processes your documents using open-source AI models running on Venice AI's private inference infrastructure, with per-user isolated containers, encryption at rest, append-only audit trails, and network-segmented infrastructure. Your data is never retained by the model provider, never used for training, and never shared with other customers at any layer of the stack.

That is not a privacy policy. It is an engineering decision.


Ready to see how private AI works for your firm? Start your free trial — no credit card required for 7 days.

Liked this? Get more.

Insights on legal AI and PI workflows. Unsubscribe anytime.

Your paralegal spends 80 hours on one chronology.
We do it in minutes.

No sales call. No demo. Upload your records and see it work.