HN Jobs

A searchable index of Hacker News “Who is hiring?” job postings.

← All postings · November 2025 thread

NVIDIA

Senior Deep Learning Software Engineer, Inference

CompanyNVIDIA
Roles
  • Senior Deep Learning Software Engineer, Inference
  • Engineering Manager, Deep Learning Inference
  • DL Performance Software Engineer - LLM Inference
Role taxonomySoftware EngineeringLeadership / ManagementAI / ML / ResearchLead / Manager
SpecialtiesSoftware Engineering, Leadership, LLM
LocationRemote (US)
Salary
Apply viaApplication linkhttps://nvidia.wd5.myworkdayjobs.com/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-Deep-Learning-Software-Engineer--Inference_JR2003655
Hiring notes
TechML/AI
RegionsUS
Posted byakbarnur
PostedNov 4, 2025
SourceView on Hacker News ↗

Original posting

NVIDIA | vLLM + SGLang | Deep Learning Inference | Remote (North America preferred) Hi everyone — I’m Akbar, Senior Manager of Deep Learning Inference Software at NVIDIA. I lead our engineering efforts around vLLM and SGLang, two of the most widely used open-source LLM inference frameworks. We’re building teams focused on making LLM inference faster, more efficient, and more reliable at scale — from runtime and scheduling optimizations to kernel fusion, distributed serving, and continuous integration across new GPU architectures (Hopper, Blackwell, etc.). We’re hiring for multiple roles: • Senior Deep Learning Software Engineer, Inference (https://nvidia.wd5.myworkdayjobs.com/NVIDIAExternalCareerSit...) • Engineering Manager, Deep Learning Inference (https://nvidia.wd5.myworkdayjobs.com/NVIDIAExternalCareerSit...) • DL Performance Software Engineer - LLM Inference (https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCar...) • DL Performance Software Engineer - LLM Inference (https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCar...) These roles are remote-friendly (North America preferred) and fully focused on upstream open-source development — working directly with the maintainers and the wider AI community. If you’re excited about large-scale inference, compiler/runtime performance, and pushing GPUs to their limits, we’d love to talk.