AI & Machine Learning2026-04-1811 min read

How to Power an AI Career Assistant with Structured Job Data

Learn how to build an AI career assistant using structured job data — covering RAG pipelines, vector embeddings, skill matching, and salary comparison.

The Promise of AI Career Assistants

Job searching is broken in ways that are deeply frustrating. Job seekers spend hours manually filtering listings, writing customized cover letters, and trying to figure out whether their skills match a role and whether the salary is competitive. AI career assistants promise to fix this — but only if they're grounded in current, structured job data.

A generic LLM can give career advice based on its training data. What it can't do is tell you what Python engineers at Series B startups in Austin are earning right now, or which specific skills are mentioned in 70% of machine learning roles this month. For that, you need to connect the AI to a live, structured job data source.

This article walks through the architecture of a real AI career assistant backed by job data, covering RAG pipelines, vector embeddings, skill matching, and salary comparison.

Architecture Overview

The system has three main components:

  • Job Data Layer: A continuously updated database of structured job listings, fed by an API like JobDataLake.
  • Retrieval Layer: A vector search index that enables semantic matching between user queries and job listings.
  • Generation Layer: An LLM that synthesizes retrieved jobs and user context into helpful, personalized responses.

This is a Retrieval-Augmented Generation (RAG) architecture. The key insight is that the LLM's job is not to know about jobs — it's to reason about jobs that the retrieval layer surfaces.

Building the Job Data Layer

Start by setting up a regular sync from a job data API into your database. Each job needs to be stored with its full structured data — especially the fields that enable filtering: skills, salary, location, seniority, and company metadata.

// Sync jobs from the API into your database
async function syncJobs() {
  const roles = ['software engineer', 'product manager', 'data scientist', 'designer'];

  for (const role of roles) {
    const res = await fetch(
      `https://api.jobdatalake.com/v1/jobs?title=${encodeURIComponent(role)}&limit=200`,
      { headers: { 'X-API-Key': process.env.JDL_API_KEY! } }
    );
    const { jobs } = await res.json();
    await upsertJobs(jobs); // Your database upsert logic
  }
}

Salary data from JobDataLake is in thousands — salary_min: 150 means $150,000. Store it that way in your database but display it as full dollars in the UI.

Building the Vector Search Index

For semantic job matching, you need to embed job descriptions into vector space. This lets a user say "I want a role where I'll build data pipelines" and surface jobs that mention ETL, data engineering, and similar concepts — not just jobs with those exact words.

The embedding strategy matters. Rather than embedding the raw job description (which can be long and noisy), create a condensed representation that captures the key signals:

function createJobEmbeddingText(job: Job): string {
  return [
    job.title,
    job.company.name,
    `Seniority: ${job.seniority_level}`,
    `Skills: ${job.skills.join(', ')}`,
    `Location: ${job.location.city ?? 'Remote'}`,
    // First 500 chars of description for semantic context
    job.description.slice(0, 500),
  ].join('. ');
}

async function embedJobs(jobs: Job[]) {
  const texts = jobs.map(createJobEmbeddingText);
  const embeddings = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: texts,
  });
  return jobs.map((job, i) => ({
    ...job,
    embedding: embeddings.data[i].embedding,
  }));
}

Store these embeddings in a vector database (pgvector, Pinecone, Qdrant, or Weaviate all work well). Index on skills, salary range, location, and seniority so you can combine vector similarity with structured filters.

The Retrieval Pipeline

When a user asks a question, you need to parse their intent and translate it into a retrieval query. This is where the LLM does its first job — not answering, but understanding.

async function parseUserIntent(userMessage: string): Promise<JobQuery> {
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'system',
        content: `Extract job search parameters from the user's message.
        Return JSON with: title, skills (array), location, remote (bool),
        salary_min (in thousands), seniority_level, and semantic_query (free text).`
      },
      { role: 'user', content: userMessage }
    ],
    response_format: { type: 'json_object' },
  });
  return JSON.parse(response.choices[0].message.content!);
}

With the parsed intent, do a hybrid search — structured filters first, then semantic reranking:

async function retrieveJobs(query: JobQuery): Promise<Job[]> {
  // Step 1: Structured pre-filter using the API
  const params = new URLSearchParams();
  if (query.title) params.set('title', query.title);
  if (query.location) params.set('location', query.location);
  if (query.salary_min) params.set('salary_min', String(query.salary_min));
  if (query.skills?.length) params.set('skills', query.skills.join(','));

  const res = await fetch(`https://api.jobdatalake.com/v1/jobs?${params}`, {
    headers: { 'X-API-Key': process.env.JDL_API_KEY! },
  });
  const { jobs } = await res.json();

  // Step 2: Semantic reranking if we have a free-text query
  if (query.semantic_query && jobs.length > 0) {
    return await semanticRerank(jobs, query.semantic_query, 10);
  }

  return jobs.slice(0, 10);
}

Skill Gap Analysis

One of the most valuable features an AI career assistant can offer is skill gap analysis: given a user's current skills, what's missing for their target role?

The job data layer makes this tractable. Instead of guessing what skills a role needs, you can analyze actual job postings:

async function analyzeSkillGap(userSkills: string[], targetTitle: string) {
  // Fetch a sample of job postings for the target role
  const res = await fetch(
    `https://api.jobdatalake.com/v1/jobs?title=${encodeURIComponent(targetTitle)}&limit=50`,
    { headers: { 'X-API-Key': process.env.JDL_API_KEY! } }
  );
  const { jobs } = await res.json();

  // Count skill frequencies
  const skillCounts: Record<string, number> = {};
  for (const job of jobs) {
    for (const skill of (job.skills ?? [])) {
      skillCounts[skill] = (skillCounts[skill] ?? 0) + 1;
    }
  }

  // Rank by frequency
  const ranked = Object.entries(skillCounts)
    .sort(([, a], [, b]) => b - a)
    .map(([skill, count]) => ({
      skill,
      frequency: Math.round((count / jobs.length) * 100),
      userHas: userSkills.some(s => s.toLowerCase() === skill.toLowerCase()),
    }));

  return {
    requiredSkills: ranked.filter(s => s.frequency >= 50),
    gaps: ranked.filter(s => s.frequency >= 30 && !s.userHas),
  };
}

When you pass this analysis to the LLM alongside job listings, it can give highly specific advice: "You're missing Kubernetes, which appears in 68% of senior DevOps roles right now. Here's how you might address that gap..."

Salary Intelligence

Another killer feature: real-time salary benchmarking. Instead of citing outdated surveys, your assistant can answer "What do senior data engineers earn in Seattle?" with live market data.

async function getSalaryBenchmark(title: string, location: string) {
  const res = await fetch(
    `https://api.jobdatalake.com/v1/jobs?title=${encodeURIComponent(title)}&location=${encodeURIComponent(location)}&limit=100`,
    { headers: { 'X-API-Key': process.env.JDL_API_KEY! } }
  );
  const { jobs } = await res.json();

  const salaries = jobs
    .filter((j: Job) => j.salary_min && j.salary_max)
    .map((j: Job) => ({ min: j.salary_min * 1000, max: j.salary_max * 1000 }));

  if (salaries.length === 0) return null;

  const avg = (arr: number[]) => arr.reduce((a, b) => a + b, 0) / arr.length;

  return {
    sampleSize: salaries.length,
    medianMin: median(salaries.map((s: {min: number}) => s.min)),
    medianMax: median(salaries.map((s: {max: number}) => s.max)),
    avgMin: avg(salaries.map((s: {min: number}) => s.min)),
    avgMax: avg(salaries.map((s: {max: number}) => s.max)),
  };
}

The Generation Layer: Prompting for Career Advice

With retrieved jobs and computed signals (skill gaps, salary benchmarks), the final step is synthesis. The system prompt matters a lot here:

const CAREER_ASSISTANT_SYSTEM_PROMPT = `You are an expert career advisor.
You have access to current job market data including live job listings,
salary ranges, and in-demand skills.

When answering questions:
- Ground your advice in the specific job data provided, not general assumptions
- Cite specific numbers (salary ranges, skill frequencies) when relevant
- Be honest about trade-offs — not every user's target is achievable immediately
- Offer concrete next steps, not vague encouragement
- Salary data is in full dollars (e.g., $150,000 not 150k notation)

Current job market context:
{job_listings}
{salary_benchmark}
{skill_gap_analysis}
`;

The key discipline is making the LLM cite the actual data you've provided rather than hallucinating market insights. Structured prompting with specific data points is the difference between a career assistant that's genuinely useful and one that sounds helpful but misleads.

Using the MCP Server for Agentic Workflows

If you're building agentic career tools — where the AI can autonomously search for jobs, compare options, and generate application materials — the JobDataLake MCP server at https://mcp.jobdatalake.com is worth exploring. It exposes job search as a tool that LLM agents can call directly, simplifying the integration layer for agent frameworks like LangChain, AutoGPT, or custom Claude agents.

What Users Actually Want

The technical architecture matters, but the product succeeds or fails on the questions it can answer well. Based on user research, the highest-value queries for career assistants are:

  • "Am I being underpaid?" (salary benchmarking)
  • "What skills should I learn next?" (skill gap + trend analysis)
  • "Which companies are hiring for roles like mine?" (company discovery)
  • "Should I take this offer?" (offer evaluation against market)
  • "What do I need to get from [current role] to [target role]?" (career pathing)

All of these are answerable well only with live, structured job data. That's the foundation everything else rests on.

Frequently Asked Questions

How do I build an AI career assistant?

Combine a job data API (for real-time listings) with an LLM (for conversation) and vector embeddings (for semantic matching). Use structured fields like skills, salary, and seniority for filtering, and semantic search for natural language queries.

What data do AI career tools need?

At minimum: job title, company, location, salary range, required skills, seniority level, and remote type. Enriched data with normalized fields is critical for accurate skill matching and salary benchmarking.

Can I use an MCP server for a career assistant?

Yes. The JobDataLake MCP server at mcp.jobdatalake.com lets AI agents like Claude search 1M+ jobs directly. It supports 5 tools including search, job details, company profiles, and skill discovery.

Try JobDataLake

1M+ enriched job listings from 20,000+ companies. Free API key with 1,000 credits — no credit card required.