User Guide/Source Types
Connection Guide

Use Web Sources in Cortex

Web sources are the fallback when a site has no usable RSS or API endpoint. They work best when the page structure is stable enough to target with selectors.

How it Works

Add a listing-page URL plus selectors for the repeating items and links Cortex should extract.

If needed, Cortex can also open each linked article page to refine metadata such as the title or publish date.

Best For
  • Blog, newsroom, or updates pages with no usable RSS
  • Stable listing pages built around repeatable cards or articles
  • Publisher sections where metadata lives on the linked page
  • Cases where scraping is acceptable because no cleaner source exists

Web is the most flexible source type, but it also has the highest maintenance cost. Use it when the site does not offer a clean RSS feed or API endpoint and you still need Cortex to monitor a stable page structure directly.

Web Setup

Start with the narrowest listing page you can find, then identify the selectors that mark each item and its canonical link. Add page-level selectors only when the listing page does not carry enough metadata on its own.

Key Configuration Fields

Name Description
URL
Use the listing page that already gathers the items you want. A broad homepage is usually a worse starting point than a purpose-built announcements or blog index page.
Example
https://example.com/blog
Item Selector
Target the repeating card or article container on the listing page. This is the selector that defines what Cortex treats as one item candidate.
Example
article.post
Link Selector
Point at the article link inside each item container so Cortex can resolve a canonical URL for each extracted item.
Example
a.title
Wait For
Use this when cards load after the initial page render. It helps Cortex wait for the listing page structure before extraction begins.
Example
css:div.card
Page Title Selector
Use page-level selectors when the listing page only exposes short cards and you want Cortex to refine the title or publish date from the linked article page.
Example
h1
Tip

For the full field reference, see Create Source in the API docs.

Examples

These examples show the kinds of page structures Web sources usually target successfully.

Blog Listing Page

Self-linking card grid where each listing item is already the canonical article link. Use this when the listing page exposes enough metadata to avoid heavier page-level extraction.

Create Source
Source Name
Phantom Blog
URL
https://phantom.com/learn/blog
Item Selector
a[href*='/learn/blog/']
Link Selector
:self
Title Selector
h3
Date Selector
h3 + div
Wait For
css:a[href*='/learn/blog/']
Max Items per Extraction
25
JS-Rendered Blog Page

JavaScript-rendered listing that needs an explicit wait and keeps page-level fallback selectors. Use this when cards load late or article pages carry more reliable metadata than the index.

Page selectors: pageTitleSelector and pageDateSelector extract metadata from each linked article page, not the listing. Use these when the listing cards lack reliable dates.

Create Source
Source Name
Jito Blog
URL
https://www.jito.network/blog
Item Selector
a[href*='/blog/'].flex.flex-col
Link Selector
:self
Title Selector
h2
Date Selector
div > span
Wait For
css:a[href*='/blog/'].flex.flex-col
Page Title Selector
h1
Page Date Selector
meta[property='article:published_time']
Page Date Attribute
content

Troubleshooting & FAQ

The source runs, but no items are appearing.

The listing selectors are probably not matching the current page structure.

Recheck the repeating container selector first, then confirm that the link selector actually points at a usable article URL inside each matched item.

The page clearly has content, but Cortex still misses it.

The page may render its cards after the initial load.

Add a small wait or a Wait For selector so Cortex does not extract before the listing page is ready.

When should I use RSS or API Endpoint instead?

Prefer RSS or API Endpoint whenever a stable feed or JSON API already exists.

Web is usually the right choice only when you need page scraping because the publisher does not expose a cleaner connector surface.