dConcept/ hk
Safe use15 min read

What not to paste into AI tools

A practical safety guide. Types of sensitive data, how to sanitise examples, when to use private or enterprise tools, and how to keep a review step that protects people and reputation.

Key takeaways

  • Treat the prompt box like a shared surface, not a private scratchpad.
  • Sanitise examples before using public tools, especially when names, strategy or client material are involved.
  • Human review still matters for facts, tone, confidentiality and numbers.

The question to ask before you paste

Ask yourself one thing. If this text leaked tomorrow, because of a vendor breach, careless forwarding, an integration nobody secured, or a model logging policy you did not read, what breaks? Careers, contracts, licences, custody of data, trust?

Public chat models are great sandboxes when the payload is hypothetical. People cross the line casually because the interface feels temporary. It is not. Prompts and outputs flow through infrastructure you do not host and have not vetted.

That does not mean "never use AI." It means defaulting to a habit of classifying the data first, before optimism about how clever the answer will be.

Data you should rarely paste raw

Direct identifiers. Names stitched to context that makes a person recoverable. Whispering “please anonymise this” without renaming things yourself is still rehearsal for a leak.

Credentials and secrets. API keys pasted once have ended up inside training data at other vendors. If a key has been pasted, treat it as compromised and rotate it.

Unreleased strategy and financial specifics. Competitive harm is asymmetric. Once secrecy is gone, you cannot put it back.

Attorney-client or medically sensitive material. Regulatory rules differ by region and sector. Assumption is not indemnity.

Third-party material. Client contracts, supplier quotes, unpublished research shared under NDA. All require permission beyond your personal convenience.

If you flinched at any of those, sanitise the example until the structure stays but the identity is gone. Often the skeleton is all the model needs.

Sanitisation that survives review

Good redaction substitutes classes. “Regional bank A vs. challenger B.” “Employee cohort 320 to 412.” Fiscal year instead of a project diary date. Keep numeric ratios if they are useful. Strip the breadcrumbs that recombine into a unique fingerprint.

Translate jargon into descriptive neutral terms. “Internal project codename Orion” becomes “upstream logistics pilot.” Enough for brainstorming, not enough to identify the company.

When you need to quote a policy, summarise the intent instead of pasting the internal PDF. Often the abbreviated intent is what clears up the model's confusion anyway.

Keep a reusable scratch template. Situation, constraints, what you have tried, what you are unsure about. A mask you can detach from any one customer's reality.

When enterprise or private setups earn their cost

Consumer accounts let you learn fast. They run out of road when you need organisational memory: audit logs, SSO, tenancy, data-processing agreements.

If your workflow touches a stable confidential corpus, the kind internal search already exposes to staff under policy, moving into a sanctioned enterprise stack with grounding and logging starts to make sense.

"Private" GPU or VPC deployments buy control. They rarely buy zero engineering. Evaluate total cost. Onboarding, retrieval architecture, evaluation harness. The headline per-token figure is the easy part.

Do not outsource the residency question. Where do bytes live, where do they travel, when are they purged? Legal will ask, and they deserve engineering answers in plain timelines.

Human review gates that defend trust

Factual grounding. Hallucinations cluster where the references are weakest. Flag claims that need citations. Attach sources upstream when the tool supports retrieval cleanly.

Tone suitability. Stakeholder nuance slips. Sarcasm does not translate across cultures. Models can also overshare empathy when the situation is tense.

Confidential drift. Pasted examples sometimes echo back with embellished details. Diff outputs against the sanitised scaffold you started with.

Numeric work. Calculators still beat models for audited sums. Treat any model arithmetic as provisional until something mechanical recomputes it.

Even a three-bullet checklist before publishing stops elegant prose from shipping wrong facts loudly.

If something went wrong anyway

Escalate early. Incident response timelines hate delayed honesty. Freeze any further automated outreach. Inventory exactly what material might have surfaced. Loop in security, legal, comms, and whoever needs to repair stakeholder trust.

Retrospectives should fix the procedure, not the people. Where was the sanctioned path slower than the shadow channel? Closing that gap reduces repeat probability more than a punitive memo.

Iterate the policy as a living FAQ. Short vignettes work better than abstract commandments people skim once during onboarding.