Skip to content

Citation Networks

ads-bib exports four citation-network views and a WOS-format text export so you can move directly into network analysis tools. For the full output schema (DataFrame columns, run_summary.yaml keys, .gexf node attributes), see Output Artifacts.

Run Artifacts at a Glance

A completed run writes its artifacts under runs/<run_id>/:

runs/run_20260407_120000_ads_bib_openrouter/
├── config_used.yaml              # exact resolved config (reuse as CLI input)
├── run_summary.yaml              # run metadata, counts, costs
├── logs/
│   └── runtime.log
├── data/
│   ├── dataset/
│   │   ├── publications.parquet      # curated publications with topics + reduced coords
│   │   ├── references.parquet        # normalized cited-reference metadata
│   │   └── topic_info.parquet        # one row per topic
│   ├── and/
│   └── citations/
│       ├── direct.gexf
│       ├── co_citation.gexf
│       ├── bibliographic_coupling.gexf
│       ├── author_co_citation.gexf
│       └── download_wos_export.txt
└── plots/
    └── topic_map.html

The order of files above matches the order of questions during analysis: what did I run → how did it go → what dataset do I have → which networks → which external import.

The Four Network Types

Direct citation (direct.gexf)

An edge exists when one paper in the corpus directly cites another.

Use it for explicit citation lineage and directional influence; this is the strictest view and contains no inferred links.

Co-citation (co_citation.gexf)

Two papers are linked when they are cited together by a later paper.

Use it for intellectual proximity, canonical pairings, and high-level field structure. Co-citation networks tend to highlight foundational works.

Bibliographic coupling (bibliographic_coupling.gexf)

Two papers are linked when they share references.

Use it for contemporaneous similarity and topic-neighbor discovery among papers that may not cite each other directly.

Author co-citation (author_co_citation.gexf)

Two (first) authors are linked when they are cited together.

Use it for author-level intellectual structure, schools of thought, and recurring collaboration-adjacent pairings.

Which Artifact for Which Task

Goal Best artifact
Inspect document topics data/dataset/publications.parquet, data/dataset/topic_info.parquet, plots/topic_map.html
Reproduce a run config_used.yaml, run_summary.yaml
Explore direct citation flow data/citations/direct.gexf
Explore shared reception data/citations/co_citation.gexf
Explore shared reference bases data/citations/bibliographic_coupling.gexf
Explore author-level structure data/citations/author_co_citation.gexf
Import into CiteSpace / VOSviewer data/citations/download_wos_export.txt

Common issues

Common issues

  • Graph looks empty in Gephi — check citations.min_counts for each metric; raising thresholds drops weak edges. Defaults differ between raw code defaults and the packaged presets; see Configuration — Citations.
  • Wrong tool for the questiondirect is directed; co-citation and bibliographic-coupling views answer different questions; see the four sections above.
  • WOS import problems — confirm you are using data/citations/download_wos_export.txt and your tool’s WOS/Plain-text mode; see tool docs if columns misalign.

What an Exported Edge Looks Like

Every .gexf is valid XML with two blocks: a <nodes> list where each node carries the full publication metadata, and an <edges> list where each edge carries the metric-specific weight. Below is a trimmed excerpt from a co_citation.gexf so you can see the structure directly:

<graph mode="static" defaultedgetype="undirected">
  <nodes>
    <node id="1974Natur.248...30H" label="Hawking, S.W. (1974)">
      <attvalues>
        <attvalue for="Bibcode" value="1974Natur.248...30H"/>
        <attvalue for="Title"   value="Black hole explosions?"/>
        <attvalue for="Year"    value="1974"/>
        <attvalue for="topic_id" value="2"/>
        <attvalue for="Name"    value="Hawking radiation"/>
        <attvalue for="embedding_2d_x" value="-3.42"/>
        <attvalue for="embedding_2d_y" value="1.88"/>
      </attvalues>
    </node>
    <node id="1975CMaPh..43..199H" label="Hawking, S.W. (1975)"> ... </node>
  </nodes>
  <edges>
    <edge source="1974Natur.248...30H"
          target="1975CMaPh..43..199H"
          weight="7"/>
  </edges>
</graph>

The interpretation depends on the metric: in co_citation.gexf, weight=7 means the two papers are jointly cited by 7 later papers. In bibliographic_coupling.gexf, the same edge would say the two papers share 7 references. In direct.gexf, the edge is directed and weight counts how many times source cites target.

External Tooling

  • Gephi — desktop network visualization. Opens .gexf directly and keeps every node attribute the pipeline exports.
  • Gephi Lite — browser-based Gephi for quick inspection without installing the desktop app. See the embed integration guide for self-hosted iframe embeds.
  • CiteSpace — imports data/citations/download_wos_export.txt (WOS format) and runs temporal bibliometric analyses.
  • VOSviewer — imports the same WOS export and renders overlay-style clustering views.

Tuning Edge Density

All four networks run through a per-metric min_counts filter before export. The code default is 1 for each metric (keep every edge); the four packaged presets raise those thresholds to {direct: 2, co_citation: 3, bibliographic_coupling: 2, author_co_citation: 3} as practical starter values for sparse author-focused corpora. Use cited_authors_exclude or cited_author_uids_exclude when you want explicit pruning on the cited-reference side. Scale up for denser corpora, down for sparser ones. See Configuration → Citations for the raw keys.

For the full output schema (node attributes, DataFrame columns, run summary), continue to Output Artifacts. For the raw citation config keys, see Configuration.