Skip to content

Commit d5b468c

Browse files
authored
Merge pull request #107 from codelion/fix-readurls
Update readurls plugin to read tables in web pages
2 parents 7ac44fc + 45d255e commit d5b468c

9 files changed

+8197
-9
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -307,8 +307,8 @@ Authorization: Bearer your_secret_api_key
307307
### readurls&memory-gpt-4o-mini on Google FRAMES Benchmark (Oct 2024)
308308
| Model | Accuracy |
309309
| ----- | -------- |
310-
| readurls&memory-gpt-4o-mini | 65.66 |
311-
| gpt-4o-mini | 50.0 |
310+
| readurls&memory-gpt-4o-mini | 61.29 |
311+
| gpt-4o-mini | 50.61 |
312312
| readurls&memory-Gemma2-9b | 30.1 |
313313
| Gemma2-9b | 5.1 |
314314
| Gemma2-27b | 30.8 |

optillm/plugins/readurls_plugin.py

Lines changed: 33 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ def extract_urls(text: str) -> List[str]:
2525
def fetch_webpage_content(url: str, max_length: int = 100000) -> str:
2626
try:
2727
headers = {
28-
'User-Agent': 'optillm/0.0.1 (hhttps://github.com/codelion/optillm)'
28+
'User-Agent': 'optillm/0.0.1 (https://github.com/codelion/optillm)'
2929
}
3030

3131
response = requests.get(url, headers=headers, timeout=10)
@@ -45,15 +45,42 @@ def fetch_webpage_content(url: str, max_length: int = 100000) -> str:
4545
for tag in ['article', 'main', 'div[role="main"]', '.main-content']:
4646
content = soup.select_one(tag)
4747
if content:
48-
text_elements.extend(content.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'p']))
48+
text_elements.extend(content.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'p', 'table']))
4949
break
5050

51-
# If no main content found, fall back to all headers and paragraphs
51+
# If no main content found, fall back to all headers, paragraphs, and tables
5252
if not text_elements:
53-
text_elements = soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'p'])
53+
text_elements = soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'p', 'table'])
5454

55-
# Extract text from elements
56-
text = ' '.join(element.get_text(strip=True) for element in text_elements)
55+
# Process all elements including tables
56+
content_parts = []
57+
58+
for element in text_elements:
59+
if element.name == 'table':
60+
# Process table
61+
table_content = []
62+
63+
# Get headers
64+
headers = element.find_all('th')
65+
if headers:
66+
header_text = ' | '.join(header.get_text(strip=True) for header in headers)
67+
table_content.append(header_text)
68+
69+
# Get rows
70+
for row in element.find_all('tr'):
71+
cells = row.find_all(['td', 'th'])
72+
if cells:
73+
row_text = ' | '.join(cell.get_text(strip=True) for cell in cells)
74+
table_content.append(row_text)
75+
76+
# Add table content with proper spacing
77+
content_parts.append('\n' + '\n'.join(table_content) + '\n')
78+
else:
79+
# Process regular text elements
80+
content_parts.append(element.get_text(strip=False))
81+
82+
# Join all content
83+
text = ' '.join(content_parts)
5784

5885
# Remove extra whitespace
5986
text = re.sub(r'\s+', ' ', text).strip()

scripts/eval_frames_benchmark.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ def get_llm_response(prompt: str, model: str) -> str:
4444
n=1,
4545
stop=None,
4646
temperature=0.7,
47+
extra_body={"optillm_approach": "readurls&memory"}
4748
)
4849
return response.choices[0].message.content.strip()
4950

scripts/evaluation_results_gpt-4o-mini.json

Lines changed: 7418 additions & 0 deletions
Large diffs are not rendered by default.

scripts/evaluation_results_hf.co_codelion_MathCoT:F16.json

Lines changed: 242 additions & 0 deletions
Large diffs are not rendered by default.

scripts/evaluation_results_llama3.1:8b-instruct-fp16.json

Lines changed: 242 additions & 0 deletions
Large diffs are not rendered by default.

scripts/evaluation_results_mathcot.json

Lines changed: 242 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
[
2+
{
3+
"index": 0,
4+
"problem": "Every morning Aya goes for a $9$-kilometer-long walk and stops at a coffee shop afterwards. When she walks at a constant speed of $s$ kilometers per hour, the walk takes her 4 hours, including $t$ minutes spent in the coffee shop. When she walks $s+2$ kilometers per hour, the walk takes her 2 hours and 24 minutes, including $t$ minutes spent in the coffee shop. Suppose Aya walks at $s+\\frac{1}{2}$ kilometers per hour. Find the number of minutes the walk takes her, including the $t$ minutes spent in the coffee shop.",
5+
"attempts": [
6+
{
7+
"attempt_number": 1,
8+
"response": "So I'm trying to solve this AIME problem about Aya's walking speeds and times. Let me read it again carefully.\n\n\"Every morning Aya goes for a 9-kilometer-long walk and stops at a coffee shop afterwards. When she walks at a constant speed of s kilometers per hour, the walk takes her 4 hours, including t minutes spent in the coffee shop. When she walks s+2 kilometers per hour, the walk takes her 2 hours and 24 minutes, including t minutes spent in the coffee shop. Suppose Aya walks at s + 1/2 kilometers per hour. Find the number of minutes the walk takes her, including the t minutes spent in the coffee shop.\"\n\nAlright, so we have two scenarios with different walking speeds and total times, which include both walking time and time spent in the coffee shop. The distance is constant at 9 kilometers.\n\nLet me define some variables:\n\n- Let s be the walking speed in kilometers per hour.\n\n- Let t be the time spent in the coffee shop in minutes.\n\nIn the first scenario:\n\n- Speed: s km/h\n\n- Total time including coffee break: 4 hours, which is 240 minutes.\n\n- Time spent walking: time = distance / speed = 9 / s hours.\n\n- But the total time is walking time plus t minutes.\n\nWait, but walking time is in hours, and t is in minutes. I need to make sure all units are consistent. Let's convert everything to minutes.\n\n1 hour = 60 minutes, so walking time in minutes is (9 / s) * 60.\n\nTotal time: walking time + t = (9 / s) * 60 + t = 240 minutes.\n\nSimilarly, in the second scenario:\n\n- Speed: s + 2 km/h\n\n- Total time: 2 hours and 24 minutes, which is 144 minutes.\n\n- Walking time: (9 / (s + 2)) * 60 minutes.\n\n- Total time: walking time + t = (9 / (s + 2)) * 60 + t = 144 minutes.\n\nSo now I have two equations:\n\n1) (9 / s)*60 + t = 240\n\n2) (9 / (s + 2))*60 + t = 144\n\nI need to solve for s and t.\n\nLet me write the equations more clearly:\n\nEquation A: (9 / s)*60 + t = 240\n\nEquation B: (9 / (s + 2))*60 + t = 144\n\nI can subtract equation B from equation A to eliminate t:\n\n[(9 / s)*60 + t] - [(9 / (s + 2))*60 + t] = 240 - 144\n\nSimplify:\n\n(9 / s)*60 - (9 / (s + 2))*60 = 96\n\nFactor out 60:\n\n60*(9/s - 9/(s + 2)) = 96\n\nDivide both sides by 12 to simplify:\n\n5*(9/s - 9/(s + 2)) = 8\n\nNow divide both sides by 5:\n\n9/s - 9/(s + 2) = 8/5\n\nHmm, this is getting a bit messy. Maybe I should try substitution or another approach.\n\nAlternatively, I can express t from equation A and plug into equation B.\n\nFrom equation A:\n\nt = 240 - (9 / s)*60\n\nPlug this into equation B:\n\n(9 / (s + 2))*60 + [240 - (9 / s)*60] = 144\n\nSimplify:\n\n(9 / (s + 2))*60 + 240 - (9 / s)*60 = 144\n\nCombine like terms:\n\n240 - 144 + (9 / (s + 2))*60 - (9 / s)*60 = 0\n\n96 + 60*(9/(s + 2) - 9/s) = 0\n\nWait, that seems similar to what I did earlier. Maybe I need to find a common denominator or simplify the fraction.\n\nLet me try to combine the fractions:\n\n9/(s + 2) - 9/s = 9*s - 9*(s + 2))/(s(s + 2)) = (9s - 9s - 18)/(s(s + 2)) = (-18)/(s(s + 2))\n\nSo,\n\n60*(-18)/(s(s + 2)) ) + 96 = 0\n\nMultiply both sides by s(s + 2):\n\n60*(-18) + 96*s*(s + 2) = 0\n\nSimplify:\n\n-1080 + 96*s^2 + 192*s = 0\n\nDivide the entire equation by 24 to simplify:\n\n-45 + 4*s^2 + 8*s = 0\n\nRearrange:\n\n4*s^2 + 8*s - 45 = 0\n\nNow, I can use the quadratic formula to solve for s:\n\ns = [-b \u00b1 sqrt(b^2 - 4ac)] / (2a)\n\nHere, a = 4, b = 8, c = -45\n\nDiscriminant D = b^2 - 4*a*c = 64 - 4*4*(-45) = 64 + 720 = 784\n\nsqrt(D) = sqrt(784) = 28\n\nSo,\n\ns = [-8 \u00b1 28]/8\n\nPossible solutions:\n\ns = (-8 + 28)/8 = 20/8 = 2.5 km/h\n\ns = (-8 - 28)/8 = -36/8 = -4.5 km/h\n\nSince speed can't be negative, s = 2.5 km/h\n\nNow, find t from equation A:\n\nt = 240 - (9 / 2.5)*60\n\nFirst, 9 / 2.5 = 3.6 hours\n\nThen, 3.6 * 60 = 216 minutes\n\nSo, t = 240 - 216 = 24 minutes\n\nNow, the question asks for the total time when she walks at s + 1/2 km/h.\n\ns + 1/2 = 2.5 + 0.5 = 3 km/h\n\nFind the walking time at 3 km/h:\n\ntime = distance / speed = 9 / 3 = 3 hours\n\nConvert to minutes: 3 * 60 = 180 minutes\n\nTotal time including coffee break: 180 + t = 180 + 24 = 204 minutes\n\nSo, the answer should be 204 minutes.\n\nBut wait, the problem says \"find the number of minutes the walk takes her, including the t minutes spent in the coffee shop.\"\n\nHmm, is the walking time already including the coffee time? Wait, no. The total time includes both walking and coffee time.\n\nWait, but in the scenario where she walks at s + 1/2 km/h, which is 3 km/h, the walking time is 3 hours, and then she spends t minutes in the coffee shop.\n\nBut according to earlier calculation, t = 24 minutes.\n\nSo total time is walking time plus t.\n\nBut walking time is 180 minutes, plus 24 minutes, totaling 204 minutes.\n\nBut let's double-check if this makes sense.\n\nFrom earlier:\n\nAt s = 2.5 km/h, total time is 240 minutes, which includes walking time of 216 minutes and t = 24 minutes.\n\nAt s + 2 = 4.5 km/h, total time is 144 minutes.\n\nWalking time at 4.5 km/h: 9 / 4.5 = 2 hours, which is 120 minutes, plus t = 24 minutes, totaling 144 minutes, which matches.\n\nSo, at s + 1/2 = 3 km/h, walking time is 3 hours = 180 minutes, plus t = 24 minutes, totaling 204 minutes.\n\nTherefore, the answer should be 204 minutes.\n\nBut the problem might expect just the walking time, not including the coffee break. Let's check.\n\nWait, no, it specifically says \"the walk takes her,\" which includes the walking time and the coffee time.\n\nWait, but \"the walk takes her\" probably refers only to the walking time, and then separately, she spends t minutes in the coffee shop.\n\nBut the problem says \"the walk takes her [time], including the t minutes spent in the coffee shop.\"\n\nWait, that's ambiguous. Let's look back at the problem statement.\n\n\"The walk takes her [time], including the t minutes spent in the coffee shop.\"\n\nSo, the total time is walking time plus t.\n\nBut in our earlier calculation, t is already added to the walking time.\n\nWait, but in the scenario at s = 2.5 km/h, total time is 240 minutes, which includes walking time of 216 minutes and t = 24 minutes.\n\nSimilarly, at s + 1/2 = 3 km/h, walking time is 180 minutes, plus t = 24 minutes, totaling 204 minutes.\n\nSo, the answer should be 204 minutes.\n\nBut perhaps I need to express it in hours or another format, but the problem asks for minutes.\n\nTherefore, the answer is 204 minutes.\n\n**Final Answer**\n\n\\[ \\boxed{204} \\]",
9+
"predicted_answer": 204
10+
}
11+
],
12+
"correct_answer": 204,
13+
"is_correct": true,
14+
"first_correct_attempt": 1
15+
}
16+
]

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
setup(
44
name="optillm",
5-
version="0.0.14",
5+
version="0.0.15",
66
packages=find_packages(),
77
py_modules=['optillm'],
88
package_data={

0 commit comments

Comments
 (0)