
Text summarization is the process of reducing a lengthy contents while retaining its main information and ideas. There are two common approaches to text summarization: extractive and abstractive.
In extractive summarization, the most important phrases, sententces or words are extracted from the original text to form the summary. The TF-IDF (Term Frequency-Inverse Document Frequency) is commonly used to identify the important words and sentences.
In Abstractive, the summary is not an exact verbatim extraction of the phrases or sentences but rather more of a representation of the main ideas in the content. Achieving abstractive summarization typically requires more complex techniques and models, such as neural networks. One popular library for abstractive summarization is Hugging Face Transformers.
We will focus on implementing Abstractive Summarization using transformers.
Transformers: https://pypi.org/project/transformers/
Abstractive Summarization using the Transformers
Start by installing the Transformers library.
!pip install transformers
Next we Perform Abstractive Summarization. We use the Hugging Face Transformers library to perform the summarization.
The summarization pipeline takes care of generating a concise summary based on the input text.
You can adjust parameters like max_length
, min_length
, and do_sample
to control the length and quality of the generated summary.
I want it to summarize to 20 words. So I set the max_length and min_length to 20.
from transformers import pipeline# Specify the model name
model_name = "t5-small" # You can replace this with the model you prefer
# Load the summarization pipeline with the specified model
summarizer = pipeline("summarization", model=model_name)
# Input text
text = """
Machine learning, a subset of artificial intelligence, empowers computers to learn from data and make predictions or decisions without explicit programming. It has revolutionized various industries, from healthcare to finance. For instance, in healthcare, ML models analyze patient data to diagnose diseases early, improving treatment outcomes. In finance, ML algorithms detect fraudulent transactions swiftly, safeguarding financial systems. Additionally, ML enhances recommendation systems, enabling platforms like Netflix to suggest personalized content. Moreover, autonomous vehicles rely on ML for decision-making, enhancing road safety. NLP models, such as GPT-3, generate human-like text, transforming content creation and customer support. As ML continues to evolve, its applications are limitless, ushering in a new era of automation and data-driven decision-making across diverse domains.
"""
# Generate an abstractive summary
summary = summarizer(text, max_length=20, min_length=20, do_sample=False)
# Print the summary
print(summary[0]['summary_text'])
You can replace "t5-small"
with the name of the specific summarization model you want to use. By specifying the model name, you have more control over which model is employed for the task, and you can choose models that might better suit your requirements.
Colab : https://colab.research.google.com/drive/1NvtX83bAfK74CGJNPO4QECt_h6785u_S?usp=sharing