Observational Data Engineering

Astrolyte

Astrophysical field data, structured for real workflows.

Astrolyte is a lightweight platform for public observational data. It preserves raw source records, organizes metadata, and exposes processed datasets for analysis, validation, and reuse.

Why

Why Astrolyte exists

Public astronomical data is powerful, but fragmented across archives, formats, metadata conventions, and one-off analysis flows. Astrolyte exists to make those observations easier to ingest, organize, process, and reuse without losing source truth.

Working premise

The goal is not only to analyze observations, but to build a system where raw files, metadata tables, and derived products can live together cleanly.

Operational pressure points

  • Archives expose observations in inconsistent forms.
  • Metadata quality varies by provider and lane.
  • Derived products often drift away from raw source context.

Surface

What the system keeps visible

The first Astrolyte surface is intentionally narrow: keep raw truth stable, expose processed structure, and publish curated outputs that are actually usable.

Layer 1

Raw

Original archive files, untouched source records, and retrieval context remain intact.

FITS referencesarchive responsesretrieval manifests

Layer 2

Processed

Cleaned tables, standardized schemas, indexed metadata, and validated observation records are generated downstream.

metadata indexparquet tablesquality checks

Layer 3

Curated

Derived features, summaries, figures, and analysis-ready exports become the reusable public surface.

figure bundlesevent catalogsdownload exports

Lanes

Current source lanes

Three repositories currently define the Astrolyte surface: one for ingest and metadata, one for standardization and evaluation, and one for reproducible validation with archive context.

iris-solar-uv-data

IRIS solar UV

Ingest + metadata lane

Checkpoint 2026-03-25

IRIS defines the ingest and indexing backbone: raw Level 2 references stay untouched while per-OBS quicklooks, ROI time series, and event catalogs are produced downstream.

rawFITSprocessedvalidated

07

Real observations

Seven real IRIS windows are exercised in the current checkpoint.

  • metadata index
  • SJI quicklook outputs

rubin-sampling

ZTF / Gaia baseline

Standardization + evaluation lane

Baseline snapshot 2026-03-09

Rubin Sampling defines the transformation layer: truth sets, live ingest, schema standardization, parquet artifacts, and baseline evaluation all stay visible as separate products.

rawparquetprocessedvalidated

30

Usable baseline objects

Thirty RR Lyrae objects survive coverage and ingest filtering.

  • Gaia truth tables
  • raw light-curve parquet

t-crb-project

T CrB archive workflow

Validation + context lane

Lane hardening 2026-03-08

T CrB defines the reproducibility and validation layer: clean products, figures, overlap metrics, and raw-image manifests coexist inside one explicit workflow.

rawprocessedcuratedvalidated

174,872

Modern V points

The 2015-2025 modern V lane remains the clean operational baseline.

  • modern V clean products
  • all-cycles Vis products

Proof

Proof points

The site is grounded in real checkpoints rather than generalized capability claims.

Current checkpoint

07

Real IRIS observations exercised

Checkpointed in iris-solar-uv-data on March 25, 2026.

Current checkpoint

9

Merged event rows across seven per-OBS catalogs

The IRIS merged event table is derived from seven observation-scoped exports.

Current checkpoint

30

Usable RR Lyrae baseline objects

rubin-sampling keeps the baseline visible instead of skipping straight to scaled-up claims.

Current checkpoint

24/30

Period recovery in the current baseline

Recovery results are published together with known alias and failure modes.

Current checkpoint

174,872

Modern V loose observations in the T CrB workflow

This is the current clean operational lane used for modern validation work.

Current checkpoint

71

AAVSO vs ASAS-SN overlap bins

Cross-source overlap stays measurable and exportable in the same workflow.