Skip to content
Ethics & the Futureintermediate

Big Data and AI in Investment Management

Investment firms now mine alternative data such as satellite imagery, credit-card flows, and web traffic, and feed it to machine-learning models that hunt for predictive signals. The promise is faster, wider information than any analyst could read. The peril is a new set of ethical and professional risks. Data bias can bake discrimination or survivorship into a signal. Overfitting lets a model memorise noise and fail out of sample, a form of model risk. Explainability suffers when a black-box model cannot justify a recommendation. Privacy and consent govern whether the data may be used at all, and the old duties of fiduciary care, suitability, and diligence still apply to a recommendation no matter how it was generated.

Why it matters

A model is only as honest as the data and the questions behind it. Machine learning feels objective because it is arithmetic at scale, but every choice of training sample, target, and feature carries human judgement and human blind spots. A signal trained on a booming decade may simply have learned that decade. A model nobody can explain is a model nobody can defend to a client or a regulator. The lesson of this node is that new tools do not retire old duties. The fiduciary still owns the recommendation, whoever or whatever produced it.

Worked examples

Scenario

A quant team builds a model that screens thousands of variables on historical data and finds a rule with a spectacular backtested return. Deployed live, it loses money almost at once. What went wrong, and which professional duty is engaged.

Solution

The rule almost certainly overfit. Searching thousands of variables for the best in-sample fit guarantees that some pattern looks brilliant by chance, a data-mining artefact rather than a real signal. This is model risk, and it engages the duty of diligence and reasonable basis under the analysis Standard, which requires a recommendation to rest on sound research. The fix is out-of-sample and walk-forward testing, controlling the number of trials, and demanding an economic rationale, not just a high backtest.

NoteA high backtested return from a wide search is weak evidence, not strong.
Scenario

A firm wants to buy a dataset of individuals’ location and spending histories scraped from mobile apps, without those individuals’ clear consent, to predict retailer revenues. What should the investment professional weigh.

Solution

The professional must weigh privacy and consent before any signal value. Using personal data gathered without proper consent can violate data-protection law and the duty to act with integrity, even if the data is predictive. The questions are whether collection was lawful and consented, whether use is proportionate, and whether the firm can stand behind the practice publicly. Predictive power does not cure a tainted source. Where consent or legality is in doubt, the data should not be used.

NoteMaterial value never overrides a duty of lawful, ethical sourcing.

Common mistakes

  • AI models are objective because they are just maths. A model inherits the bias in its training data and the judgement in its design. Apparent objectivity can hide discrimination, survivorship, or a regime that no longer holds.
  • A great backtest proves a strategy works. A wide search over many variables will always throw up a strong in-sample result by chance. Without out-of-sample evidence and an economic rationale, it is likely overfitting.
  • If the model made the call, the human is off the hook. The fiduciary duty stays with the professional. Suitability, diligence, and a reasonable basis apply to the recommendation regardless of how it was produced.
  • More data always means better decisions. Bigger datasets add noise, leakage, and privacy exposure as well as signal. Quality, provenance, and consent matter more than raw volume.

Revision bullets

  • Alternative data plus machine learning widen the information set but add new risks
  • Data bias can embed discrimination or survivorship in a signal
  • Overfitting is model risk: a wide search memorises noise and fails out of sample
  • Explainability matters because a black-box call cannot be defended to client or regulator
  • Privacy and consent decide whether data may be used at all
  • Fiduciary duty, suitability, and diligence still attach to any AI-assisted recommendation

Quick check

A team screens 5,000 candidate signals on past data and ships the one with the best historical return. It fails immediately in live trading. The most likely explanation and the duty most directly engaged are

A manager deploys a complex machine-learning model whose recommendations cannot be explained to clients or auditors. Why is this a professional concern rather than merely a technical one

Connected topics

Sources

  1. CFA Institute, Standards of Practice Handbook (2014)
    CFA Institute. Standards of Practice Handbook. 11th ed. CFA Institute, 2014.
    Grounds the fiduciary, suitability, and reasonable-basis duties that govern any AI-assisted recommendation.
  2. CFA Institute, AI Pioneers in Investment Management (2019)
    CFA Institute. AI Pioneers in Investment Management. CFA Institute, 2019.
    Surveys alternative data and machine learning in practice and the risks of bias, overfitting, explainability, and privacy.
How to cite this page
Dr. Phil's Quant Lab. (2026). Big Data and AI in Investment Management. Derivatives Atlas. https://phucnguyenvan.com/concept/im-big-data-ai