-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Keyword Extraction Fix and OpenAI bug free #408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…e Internship keyword, Also made keyword extraction more robust
WalkthroughThe changes update environment variable handling for OpenAI API keys, modify prompt instructions for job and resume extraction, adjust schema field naming conventions, and refine how extracted keywords are serialized and stored. Default models and parameters for OpenAI providers are updated, and configuration settings are extended to include an optional API key field. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30-60 minutes Suggested reviewers
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (3)
apps/backend/app/agent/providers/openai.py (3)
18-24
:api_key
arg is accepted but discardedThe constructor signature still exposes
api_key
, yet the first line overwrites it withos.getenv
. This is misleading and blocks dependency-injection/testing.- def __init__(self, api_key: None, model: str = "gpt-4o"): - api_key = os.getenv("OPENAI_API_KEY") + def __init__(self, model: str = "gpt-4o", *, api_key: str | None = None): + api_key = api_key or os.getenv("OPENAI_API_KEY")Do the same for
OpenAIEmbeddingProvider
, or drop the parameter entirely.
26-37
: Incorrect endpoint – will raise at runtime
self._client.responses.create(...)
andresponse.output_text
are not part of the official OpenAI Python v1 SDK. Usechat.completions.create
(orcompletions.create
) and extractresponse.choices[0].message.content
(or.text
) instead.Example fix:
- response = self._client.responses.create( - model=self.model, - instructions=self.instructions, - input=prompt, - **options, - ) - return response.output_text + response = self._client.chat.completions.create( + model=self.model, + messages=[ + {"role": "system", "content": self.instructions}, + {"role": "user", "content": prompt}, + ], + **options, + ) + return response.choices[0].message.contentThis is a blocking issue—current code will crash on first invocation.
49-57
: Same silent-ignore pattern for embeddings
api_key
argument is again ignored. Align with the fix suggested for the chat provider to enable explicit key passing or remove the parameter altogether.
🧹 Nitpick comments (3)
apps/backend/app/prompt/structured_job.py (1)
8-9
: Clarify internship rule & make it case-insensitiveConsider re-phrasing:
- If employmentType contains the word "internship" (case-insensitive), set the value to "Internship".
This avoids ambiguity and matches the enum in the schema.
apps/backend/app/schemas/pydantic/structured_job.py (1)
97-98
: Verify compatibility after alias removalWith the alias gone, any code still emitting
extractedKeywords
will now fail validation.
If backward compatibility is needed, keep the alias or setpopulate_by_name=True
and accept both keys for a transition period.apps/backend/app/agent/providers/openai.py (1)
10-13
: Avoid loading .env at import timeLoading environment variables in every import can mask runtime-injected values and slows cold-starts. Prefer calling
load_dotenv()
once in the main entry-point or behind a guard.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
apps/backend/app/agent/providers/openai.py
(3 hunks)apps/backend/app/core/config.py
(1 hunks)apps/backend/app/prompt/structured_job.py
(1 hunks)apps/backend/app/prompt/structured_resume.py
(1 hunks)apps/backend/app/schemas/json/structured_job.py
(1 hunks)apps/backend/app/schemas/pydantic/structured_job.py
(1 hunks)apps/backend/app/services/resume_service.py
(0 hunks)
💤 Files with no reviewable changes (1)
- apps/backend/app/services/resume_service.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
apps/backend/app/agent/providers/openai.py (1)
apps/backend/app/agent/providers/base.py (1)
Provider
(5-11)
🔇 Additional comments (1)
apps/backend/app/schemas/json/structured_job.py (1)
45-48
: No lingeringextractedKeywords
references found
I ran both
rg -n '"extractedKeywords"'
rg -n extractedKeywords
and confirmed there are no remaining camelCase usages. All keys now match the snake_case Pydantic model.
def __init__(self, api_key: None, model: str = "gpt-4o"): | ||
api_key = os.getenv("OPENAI_API_KEY") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you keeping the api_key
parameter if it's always going to be overwritten by the environment variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my bad, it was supposed to be the api_key parameter "or" the environment variable, but during testing i removed the parameter for some reasons and kept only environment variable, looks like i forgot to add it again but it doesn't really matter
from dotenv import load_dotenv | ||
|
||
load_dotenv() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be needed here
@@ -42,7 +42,7 @@ | |||
"applyLink": "string", | |||
"contactEmail": "Optional[string]", | |||
}, | |||
"extractedKeywords": [ | |||
"extracted_keywords": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since everywhere in the json we are using camel case, any reason why we want to use snake case here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since everywhere in the json we are using camel case, any reason why we want to use snake case here?
no reason
Co-authored-by: Rahul Veer Singh <[email protected]>
Pull Request Title
Keyword Extraction Fix and OpenAI bug free
Related Issue
#386
Description
Solves openAI's embedding subscript error, any keywords extraction missed by LLM while parsing resume and fixing employmentType parsing error in job description.
Type
Proposed Changes
Summary by CodeRabbit
New Features
Improvements