![](https://crypto4nerd.com/wp-content/uploads/2023/05/0b2ykzNgqb_z2IT1N.png)
I’ve been using Python extensively for the last two years. As a result, I’m always looking for amazing libraries that can enhance my work in Data Engineering and Business Intelligence projects.
Although many libraries are available in Python for DateTime, I find Pendulum easy to use on any operation on the dates. A pendulum is my favorite library for my daily usage at my work. It extends the built-in Python datetime module, adding a more intuitive API for handling time zones and performing operations on dates and times like adding time intervals, subtracting dates, and converting between time zones. It provides a simple, human-friendly API for formatting dates and times.
!pip install pendulum
# import libraryimport pendulum
dt = pendulum.datetime(2023, 1, 31)
print(dt)
#local() creates datetime instance with local timezone
local = pendulum.local(2023, 1, 31)
print("Local Time:", local)
print("Local Time Zone:", local.timezone.name)
# Printing UTC time
utc = pendulum.now('UTC')
print("Current UTC time:", utc)
# Converting UTC timezone into Europe/Paris time
europe = utc.in_timezone('Europe/Paris')
print("Current time in Paris:", europe)
Have you encountered when the foreign language present in the data does not appear correctly? This is called Mojibake. Mojibake is a term used to describe garbled or scrambled text that occurs as a result of encoding or decoding problems. It typically occurs when text that was written in one character encoding is incorrectly decoded using a different encoding. ftfy python library will help you fix the Mojibake, which is very useful in NLP use cases.
!pip install ftfy
print(ftfy.fix_text('Correct the sentence using “ftfyâ€x9d.'))
print(ftfy.fix_text('✔ No problems with text'))
print(ftfy.fix_text('à perturber la réflexion'))
Apart from Mojibake, ftfy will fix incorrect encodings, incorrect line endings, and incorrect quotes. can understand text that was decoded as any of the following encodings:
- Latin-1 (ISO-8859–1)
- Windows-1252 (cp1252 — used in Microsoft products)
- Windows-1251 (cp1251 — the Russian version of cp1252)
- Windows-1250 (cp1250 — the Eastern European version of cp1252)
- ISO-8859–2 (which is not quite the same as Windows-1250)
- MacRoman (used on Mac OS 9 and earlier)
- cp437 (used in MS-DOS and some versions of the Windows command prompt)
The sketch is a unique AI code-writing assistant specifically designed for users who work with the pandas library in Python. It utilizes machine learning algorithms to understand the context of the user’s data and provides relevant code suggestions to make data manipulation and analysis tasks easier and more efficient. Sketch does not require users to install any additional plugins in their IDE, making it quick and easy to start using. This can greatly reduce the time and effort required for data-related tasks and help users to write better and more efficient code.
!pip install sketch
We need to add a .sketch extension to Pandas data frame to use this library.
ask is a feature of Sketch that allows users to ask questions about their data in a natural language format. It provides a text-based response to the user’s query.
# Importing libraries
import sketch
import pandas as pd
# Reading the data (using twitter data as an example)
df = pd.read_csv("tweets.csv")
print(df)
# Asking which columns are category type
df.sketch.ask("Which columns are category type?")
# To find the shape of the dataframe
df.sketch.ask("What is the shape of the dataframe")
.sketch.howto
howto is a feature that provides a code block that can be used as a starting point or conclusion for various data-related tasks. We can ask for code snippets to normalize their data, create new features, plot data, and even build models. This will save time and easily copy and paste the code; you don’t need to write the code manually from scratch.
# Asking to provide code snipped for visualising the emotions
df.sketch.howto("Visualize the emotions")
.sketch.apply
.apply function helps to generate new features, parse fields, and perform other data manipulations. To use this function, we need to have an OpenAI account and use the API Key to perform the tasks. I haven’t tried this function.
I enjoyed using this library, especially howto function, and I find it useful.
“pgeocode” is an excellent library I recently came across, which has been incredibly useful for my spatial analysis projects. For example, it allows you to find the distance between two postcodes and provides geo-information by taking a country and a postcode as inputs.
!pip install pgeocode
Getting geoinformation for specific postcodes
# Checking for country "India"nomi = pgeocode.Nominatim('In')
# Getting geo information by passing the postcodes
nomi.query_postal_code(["620018", "620017", "620012"])
“pgeocode” calculates the distance between two postcodes by taking the country and postcodes as inputs. The result is given in kilometers.
# Finding a distance between two postcodes
distance = pgeocode.GeoDistance('In')
distance.query_postal_code("620018", "620012")
rembg is another useful library that removes the background from the images easily.
!pip install rembg
# Importing libraries
from rembg import remove
import cv2
# path of input image (my file: image.jpeg)
input_path = 'image.jpeg'
# path for saving output image and saving as a output.jpeg
output_path = 'output.jpeg'
# Reading the input image
input = cv2.imread(input_path)
# Removing background
output = remove(input)
# Saving file
cv2.imwrite(output_path, output)
You may already be familiar with some of these libraries, but for me, Sketch, Pendulum, pgeocode, and ftfy are indispensable for my data engineering work. I rely on them heavily for my projects.
Humanize” provides simple, easy-to-read string formatting for numbers, dates, and times. The library’s goal is to take data and make it more human-friendly, for example, by converting a number of seconds into a more readable string like “2 minutes ago”. The library can format data in various ways, including formatting numbers with commas, converting timestamps into relative times, and more.
I frequently use integers and date & time for my data engineering projects.
!pip install humanize
# Importing library
import humanize
import datetime as dt# Formatting numbers with comma
a = humanize.intcomma(951009)
# converting numbers into words
b = humanize.intword(10046328394)
#printing
print(a)
print(b)
import humanize
import datetime as dta = humanize.naturaldate(dt.date(2012, 6, 5))
b = humanize.naturalday(dt.date(2012, 6, 5))
print(a)
print(b)
If you like this blog, please do show your appreciation by hitting like button and sharing this blog. Also, drop any comments about the post & improvements if needed. Till then HAPPY LEARNING