Project Description
Domains: Web Scraping, NLP, APIs for Communication, Database Management
Project Overview: The project aims at creating techno-sales products for the financial firm AlgoBulls. To begin with, we must create a repository of active traders by extracting information available on social media platforms, especially LinkedIn and Twitter. Once ready, people's social profiles will be extracted through certain classical and NLP tools, and a communication system will be set up to auto-follow up with them through email. Prototypical AI-based chat bots will also be explored as an additional feature.
Work Description
Roles: 3 Web Scraping/NLP developers
Stipend: Paid project (join community for more details)
Project Duration: 3+ months
Tasks/Deliverables
- R&D on web scraping methods (APIs, use of proxies, etc)
- Initial data extraction and back-end scripting
- Social profiling through NLP tools
- API integration for email communication
- Chat bot design and implementation
- Analysis and cleaning of web-scraping outputs
- Pipeline design and optimization
- Design and maintenance of the database
Skills Learned
- Proficiency in web scraping methodologies
- Knowledge of API integration and usage
- Data extraction, processing and cleaning
- NLP tools for social profiling
- Chat bot integration
- Database management
- Back-end scripting
Qualifications Required
Experience: Strong Python programming skills are needed. Experience with web-scraping libraries (bs4, Scrapy etc) and database familiarity (PostgreSQL) is appreciated.
Year of Study: Second year or above
How to Apply?
Submission Link: https://forms.gle/sWKMSDpAG2urdoKM9
Deadline: 11:59 PM, 18th February, 2024
To enroll for the project, you must fill out the form above. For further credit, you can attempt and submit the assignment below to the best of your abilities, taking the aid of any tools online. We will contact you personally if you are shortlisted for the interview.
(Optional) Assignment:
The goal of this assignment is to perform web scraping to gather relevant information from social media sites. You can scrape either LinkedIn or Twitter.
LinkedIn Scraping:
- Scrape LinkedIn profiles of about 100 professionals who are into Software Development
- Extract the following attributes for each profile:
- Name
- Current Position
- Skills
- LinkedIn URL
- Structure this data and present as a PostgreSQL data table
Twitter Scraping:
- Search for the hashtag “#coding” and scrape 1000 tweets which contain this tag
- Extract the following information for each tweet:
- Tweet content
- Number of likes
- Twitter handle of the user
- Present this data as a PostgreSQL data table
Submit the web-scraping script, a csv with the relevant information, as well as a snapshot of the data table in a GitHub repository.
Contact Us
For any general queries, join the ProSpace WhatsApp group- https://chat.whatsapp.com/E09qtrcuShp1uf2w82LCsa
For assignment queries, contact:
Email: satyamm435@gmail.com
Phone: 9324865787