Skip to content

ads-bib

ads-bib takes a NASA ADS search query and produces a normalized, curated dataset, with disambiguated author names (AND via ads-and), topic models (via BERTopic or Toponymy), and citation networks ready for e.g. Gephi, CiteSpace, or VOSviewer, locally or via API.

Quickstart

Use uv and Python 3.12.

uv pip install ads-bib
# or: pip install ads-bib

Create a .env file in your project root with the relevant API keys.

ADS_TOKEN=your-ads-token           # required
OPENROUTER_API_KEY=your-key        # only for the openrouter road
HF_TOKEN=your-key                  # only for the huggingface road
MODAL_TOKEN_ID=your-modal-id       # only for AND with backend=modal
MODAL_TOKEN_SECRET=your-modal-secret
ADS user token settings | OpenRouter Keys | Hugging Face Access Tokens | Modal.

Then run in your terminal:

ads-bib run --preset openrouter --set search.query='author:"Hawking, S*"'

Author name disambiguation is off by default. Enable the local CPU/GPU path with --set author_disambiguation.enabled=true; use --set author_disambiguation.backend=modal only when your Modal credentials are configured.

Pick your Runtime Road

Cloud, smallest local footprint?  →  openrouter
Hugging Face stack preferred?     →  hf_api
CPU, offline-friendly?            →  local_cpu
NVIDIA / CUDA available?          →  local_gpu

See Runtime Roads for hardware, keys, and cost trade-offs.

What the Package Adds

A raw ADS export gives you metadata in mixed languages, without thematic structure and without network files. ads-bib homogenizes the languages, assigns topics, disambiguates author names (AND), and exports citation networks end to end, from one CLI command.

graph LR
    A[Search & Export] --> B[Translate & Tokenize]
    B --> C[AND]
    C --> D[Topic Modeling]
    D --> E[Curation]
    E --> F[Citation Networks]

Run Outputs

A completed run writes:

runs/run_20260407_120000_ads_bib_openrouter/
├── config_used.yaml          # exact config, reusable as CLI input
├── run_summary.yaml          # run metadata, counts, costs
├── logs/runtime.log
├── data/
│   ├── search/               # ADS search result for export variants
│   ├── export/               # pre-translation frames
│   ├── translated/           # translated frames
│   ├── tokenized/            # tokenized frames
│   ├── and/                  # disambiguated frames and diagnostics
│   ├── dataset/              # final publications, references, topic_info, manifest
│   └── citations/            # GEXF/CSV/JSON networks and WOS export
└── plots/topic_map.html
Interactive topic map from author:"Hawking, S*" in datamapplot — produced by a single ads-bib run. Hover for metadata, Shift+drag to lasso a word cloud or filter by year, click topics to isolate.
Interactive author co-citation network from author:"Hawking, S*" — exported by one ads-bib run and opened in Gephi Lite.

See Output Artifacts for what each file contains.

For topic tuning and network exports, continue from there to Topic Modeling and Citation Networks.

How to Cite

If you use this package in research, cite the software metadata in CITATION.cff.