Resolved features
vs raw data feeds
There are two legitimate ways to source crypto microstructure data: subscribe to raw websocket feeds — L2 books, trades, funding — and build everything yourself, or buy features already resolved point-in-time at the API seam. Neither is universally better. This page lays out the real trade — including the rows where the raw feed wins.
What sits between a raw feed and a usable feature
A raw feed gives you everything the venue emits and nothing else: no normalization, no history, no time discipline. To turn it into something a model can train on, four layers of work have to exist somewhere — on your payroll or inside your vendor.
A raw feed is a firehose that assumes someone is always holding it. Websockets drop, venues rate-limit and rotate symbols, snapshots desync from deltas. Capture means reconnect logic, gap accounting and deduplication running around the clock — in our case across Binance, OKX, Bybit and Hyperliquid.
Every venue speaks its own dialect: different symbols, units, depth conventions, funding intervals, timestamp semantics. Before any feature exists, all of it has to be reconciled into one canonical, consistently-timestamped series per asset and signal.
The step most pipelines get wrong. Every value must be computed strictly from data stamped at or before the requested as_of — our serving store caps every read at ts ≤ as_of, so a later row physically cannot leak into a backtest. The full discipline is documented on /methodology.
The subtle failure mode of in-house pipelines: a live path and a backfill path that quietly drift apart. Here a live call is literally a historical call with as_of = now — same resolver, same transforms — so what the model trained on is what it gets in production.
What you own vs what we resolve
A comparison table where the vendor wins every row is marketing, not analysis. Raw feeds genuinely win on control, latency and granularity — if those rows decide your use case, you should run raw feeds.
| DIMENSION | RAW FEED — YOU OWN THE PIPELINE | RESOLVED FEATURES — WE OWN IT |
|---|---|---|
| Schema & flexibility RAW WINS | Entirely yours. Any shape, any encoding, any feature you can imagine. | A fixed catalog of signals, windows (1s to 24h) and transforms. Expressive, but defined by us. |
| Latency floor RAW WINS | Your colocation, your network stack — as low as you are willing to pay for. | An HTTPS API round-trip. Built for bar-level models, the wrong tool for HFT execution. |
| Granularity RAW WINS | Every tick, every L2 delta, exactly as the venue emitted it. | Resolved values at defined windows — not raw ticks. Aggregation choices are ours. |
| Vendor dependency RAW WINS | None beyond the venues themselves. | You depend on our uptime and roadmap. Exports you make are yours to keep, under license. |
| Engineering cost RESOLVED WINS | Capture, storage, normalization, monitoring and the point-in-time discipline are your headcount, indefinitely. | Included in the subscription. Your team spends its time on models, not plumbing. |
| Look-ahead safety RESOLVED WINS | Yours to design, enforce and prove — the hardest part to get right and the easiest to get silently wrong. | Enforced by construction (ts ≤ as_of on every read), with a public protocol to falsify it. |
| Live / backtest parity RESOLVED WINS | Two pipelines to keep byte-identical, forever. | One resolver answers both; live is historical with as_of = now. |
| Historical depth HONEST TIE | Exactly what you have recorded — or what you can buy and trust. | Our live archive is young and we say so; every analytics response declares its real covered window. |
Buy raw when…
- You have a dedicated data-engineering team — and keeping it is part of your edge.
- Your signals need tick-level or custom L2 constructions no catalog will ever ship.
- Execution latency matters more than research throughput.
- Full schema control and zero vendor dependency are hard requirements.
Buy resolved when…
- You want leak-free inputs ready to model on, not a pipeline project.
- Your horizons are bar-level — seconds to daily — and the signal catalog covers what you trade.
- Live/backtest parity matters more to you than schema control.
- Your team’s time is better spent on models than on reconnect logic.
In practice many desks do both: raw feeds where execution lives, resolved features where research lives. If you are weighing the leak-free question specifically, start with the leak-free backtesting guide — it walks through the failure modes the point-in-time discipline exists to prevent.
“Resolved point-in-time” is a checkable claim, not a slogan. Record a live response at time T, replay the same keys historically with as_of = T, and compare — if they ever differ, our claim is broken and we treat it as a critical bug. The full protocol is on /methodology, and the sample needs no account at all.
Every new account starts with a 14-day trial of the Signal plan — no card required. Browse the signal catalog, check pricing, and run the verification protocol before you pay anything.