Deepankar Yadav's Blog

Deepankar Yadav's Blog

Follow

Follow

Deepankar Yadav

Deepankar Yadav

I wrangle bytes at Cognizant, making Big Data sing. By night, I spill the beans (and code) on #dataengineering on Hashnode. Join me to conquer coding & laugh along the way! 🚀

Bucketing in Spark

Pinned

Jan 6, 20243 min read

🤔How Bucketing Organizes Your Apache Spark Universe⚡ · Bucketing 🪣 Bucketing is a way to assign rows of a dataset to specific buckets and collocate...

Bucketing in Spark

Understanding UDFs in PySpark

May 15, 20244 min read

Speed Demon or Traffic Jam? · Hey Spark enthusiasts! Today, we're diving into the world of User-Defined Functions (UDFs) in PySpark. UDFs are like custom...

Understanding UDFs in PySpark

Thin vs Thick vs Balanced Executors

May 9, 20248 min read

Choose the right config for your spark application · In the world of Apache Spark, two types of executors reign supreme: thin and thick. Each has its own...

Thin vs Thick vs Balanced Executors

Join Strategies in Apache Spark

Feb 26, 20244 min read

Although we are quite familiar with join operations in spark, but do you know spark has some inbuilt tricks to do joins in an efficient manner without...

Join Strategies in Apache Spark

Stop "WithColumn" Chain

Jan 11, 20242 min read

Breaking News in Dataland: The WithColumn Chain is a Performance Thief! · Attention, PySpark wranglers! We've uncovered a hidden culprit that's been...

Stop "WithColumn" Chain