Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL

Abstract

Large Language Models (LLMs) have shown impressive capabilities in transforming natural language questions about relational databases into SQL queries. Despite recent improvements, small LLMs struggle to handle questions involving multiple tables and complex SQL patterns under a Zero-Shot Learning (ZSL) setting. Supervised Fine-Tuning (SFT) partially compensates for the knowledge deficits in pretrained models but falls short while dealing with queries involving multi-hop reasoning. To bridge this gap, different LLM training strategies to reinforce reasoning capabilities have been proposed, ranging from leveraging a thinking process within ZSL, including reasoning traces in SFT, or adopt Reinforcement Learning (RL) strategies. However, the influence of reasoning on Text2SQL performance is still largely unexplored.

This paper investigates to what extent LLM reasoning capabilities influence their Text2SQL performance on four benchmark datasets. To this end, it considers the following LLM settings: (1) ZSL, including general-purpose reasoning or not; (2) SFT, with and without task-specific reasoning traces; (3) RL, exploring the use of different rewarding functions, both the established EXecution accuracy (EX) and a mix with fine-grained ones that also account the precision, recall, and cardinality of partially correct answers; (4) SFT+RL, i.e, a two-stage approach that combines SFT and RL.

The results show that general-purpose reasoning under ZSL proves to be ineffective in tackling complex Text2SQL cases. Small LLMs benefit from SFT with reasoning much more than larger ones. RL is generally beneficial across all tested models and datasets. The use of the fine-grained metrics turns out to be the most effective RL strategy. Thanks to RL and the novel text2SQL rewards, the 7B Qwen-Coder-2.5 model performs on par with 400+ Billion ones (including gpt-4o) on the Bird dataset.

Key Contributions

1. Comprehensive Training Strategy Analysis

We systematically evaluate four different training approaches for Text2SQL:

Zero-Shot Learning (ZSL) with and without general-purpose reasoning
Supervised Fine-Tuning (SFT) with task-specific reasoning traces
Reinforcement Learning (RL) with novel reward functions
Hybrid SFT+RL approach combining both strategies

2. Novel Reward Functions for Text2SQL

We introduce fine-grained reward functions based on QATCH metrics:

Cell Precision (CP): Fraction of predicted cells that are correct
Cell Recall (CR): Fraction of target cells that are predicted
Tuple Cardinality (TC): Ratio-based cardinality matching
Format Reward (FR): Adherence to reasoning tag structure
Tag Count Reward (TCR): Prevention of reward hacking

3. State-of-the-Art Results

Our Think2SQL-7B model achieves:

56.1% weighted average on BIRD dataset
Performance comparable to 400+ billion parameter models (including GPT-4o)
8.5% improvement over base Qwen-Coder-2.5-7B model
Superior performance on challenging multi-hop reasoning queries

4. Key Insights

Task-specific reasoning is essential; general reasoning capabilities alone are insufficient
Dense rewards (QATCH metrics) outperform sparse rewards (execution accuracy) in RL
Small LLMs benefit more from reasoning traces than larger models
RL training is particularly effective for complex SQL patterns involving multiple tables

Conclusion

This work demonstrates that small, efficiently trained models can compete with much larger ones when equipped with appropriate reasoning capabilities and training strategies. The introduction of fine-grained rewards for Text2SQL RL opens new avenues for improving structured query generation.

Future directions include:

Extending the approach to more complex database schemas
Investigating the combination of sparse and dense rewards
Applying the methodology to other structured generation tasks
Developing more sophisticated schema linking techniques

Citation

@article{papicchio2025think2sql,
  title={Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL},
  author={Papicchio, Simone and Rossi, Simone and Cagliero, Luca and Papotti, Paolo},
  journal={arXiv preprint arXiv:2504.15077},
  year={2025}
}