Citation Networks¶
ads-bib exports four citation-network views and a WOS-format text export so
you can move directly into network analysis tools. For the full output schema
(DataFrame columns, run_summary.yaml keys, .gexf node attributes), see
Output Artifacts.
Run Artifacts at a Glance¶
A completed run writes its artifacts under runs/<run_id>/:
runs/run_20260407_120000_ads_bib_openrouter/
├── config_used.yaml # exact resolved config (reuse as CLI input)
├── run_summary.yaml # run metadata, counts, costs
├── logs/
│ └── runtime.log
├── data/
│ ├── dataset/
│ │ ├── publications.parquet # curated publications with topics + reduced coords
│ │ ├── references.parquet # normalized cited-reference metadata
│ │ └── topic_info.parquet # one row per topic
│ ├── and/
│ └── citations/
│ ├── direct.gexf
│ ├── co_citation.gexf
│ ├── bibliographic_coupling.gexf
│ ├── author_co_citation.gexf
│ └── download_wos_export.txt
└── plots/
└── topic_map.html
The order of files above matches the order of questions during analysis: what did I run → how did it go → what dataset do I have → which networks → which external import.
The Four Network Types¶
Direct citation (direct.gexf)¶
An edge exists when one paper in the corpus directly cites another.
Use it for explicit citation lineage and directional influence; this is the strictest view and contains no inferred links.
Co-citation (co_citation.gexf)¶
Two papers are linked when they are cited together by a later paper.
Use it for intellectual proximity, canonical pairings, and high-level field structure. Co-citation networks tend to highlight foundational works.
Bibliographic coupling (bibliographic_coupling.gexf)¶
Two papers are linked when they share references.
Use it for contemporaneous similarity and topic-neighbor discovery among papers that may not cite each other directly.
Author co-citation (author_co_citation.gexf)¶
Two (first) authors are linked when they are cited together.
Use it for author-level intellectual structure, schools of thought, and recurring collaboration-adjacent pairings.
Which Artifact for Which Task¶
| Goal | Best artifact |
|---|---|
| Inspect document topics | data/dataset/publications.parquet, data/dataset/topic_info.parquet, plots/topic_map.html |
| Reproduce a run | config_used.yaml, run_summary.yaml |
| Explore direct citation flow | data/citations/direct.gexf |
| Explore shared reception | data/citations/co_citation.gexf |
| Explore shared reference bases | data/citations/bibliographic_coupling.gexf |
| Explore author-level structure | data/citations/author_co_citation.gexf |
| Import into CiteSpace / VOSviewer | data/citations/download_wos_export.txt |
Common issues¶
Common issues
- Graph looks empty in Gephi — check
citations.min_countsfor each metric; raising thresholds drops weak edges. Defaults differ between raw code defaults and the packaged presets; see Configuration — Citations. - Wrong tool for the question —
directis directed; co-citation and bibliographic-coupling views answer different questions; see the four sections above. - WOS import problems — confirm you are using
data/citations/download_wos_export.txtand your tool’s WOS/Plain-text mode; see tool docs if columns misalign.
What an Exported Edge Looks Like¶
Every .gexf is valid XML with two blocks: a <nodes> list where each node
carries the full publication metadata, and an <edges> list where each edge
carries the metric-specific weight. Below is a trimmed excerpt from a
co_citation.gexf so you can see the structure directly:
<graph mode="static" defaultedgetype="undirected">
<nodes>
<node id="1974Natur.248...30H" label="Hawking, S.W. (1974)">
<attvalues>
<attvalue for="Bibcode" value="1974Natur.248...30H"/>
<attvalue for="Title" value="Black hole explosions?"/>
<attvalue for="Year" value="1974"/>
<attvalue for="topic_id" value="2"/>
<attvalue for="Name" value="Hawking radiation"/>
<attvalue for="embedding_2d_x" value="-3.42"/>
<attvalue for="embedding_2d_y" value="1.88"/>
</attvalues>
</node>
<node id="1975CMaPh..43..199H" label="Hawking, S.W. (1975)"> ... </node>
</nodes>
<edges>
<edge source="1974Natur.248...30H"
target="1975CMaPh..43..199H"
weight="7"/>
</edges>
</graph>
The interpretation depends on the metric: in co_citation.gexf, weight=7
means the two papers are jointly cited by 7 later papers. In
bibliographic_coupling.gexf, the same edge would say the two papers share
7 references. In direct.gexf, the edge is directed and weight counts how
many times source cites target.
External Tooling¶
- Gephi — desktop network visualization. Opens
.gexfdirectly and keeps every node attribute the pipeline exports. - Gephi Lite — browser-based Gephi for quick inspection without installing the desktop app. See the embed integration guide for self-hosted iframe embeds.
- CiteSpace — imports
data/citations/download_wos_export.txt(WOS format) and runs temporal bibliometric analyses. - VOSviewer — imports the same WOS export and renders overlay-style clustering views.
Tuning Edge Density¶
All four networks run through a per-metric min_counts filter before export.
The code default is 1 for each metric (keep every edge); the four packaged
presets raise those thresholds to
{direct: 2, co_citation: 3, bibliographic_coupling: 2, author_co_citation: 3}
as practical starter values for sparse author-focused corpora. Use
cited_authors_exclude or cited_author_uids_exclude when you want explicit
pruning on the cited-reference side. Scale up for denser corpora, down for
sparser ones. See
Configuration → Citations for the raw keys.
For the full output schema (node attributes, DataFrame columns, run summary), continue to Output Artifacts. For the raw citation config keys, see Configuration.