Duration: 3 or 6 months, depending on the applicant’s preference
Overview:
PatentAssist.ai is at the forefront of leveraging advanced AI and machine learning technologies to revolutionize patent analysis and management. We are looking for a passionate and skilled Data Engineer Intern to join our team in developing a cutting-edge patent database utilizing vector embedding technologies. This project involves data scraping, refining, denoising, and organizing vast amounts of patent information into a highly efficient and searchable vector embedding database.
Responsibilities:
- Collaborate with the development team to design and implement a vector embedding database tailored for patent data.
- Employ data scraping techniques to gather patent information from various sources.
- Refine and preprocess data to enhance quality and relevancy, including denoising and organizing data effectively.
- Utilize embedding models to convert patent documents into high-dimensional vector spaces, facilitating efficient storage and retrieval.
- Implement indexing and search algorithms to support similarity searches within the patent database, enabling advanced query capabilities.
- Monitor and optimize the performance of the database, ensuring high efficiency and accuracy in data handling.
- Document the development process, including challenges faced and solutions implemented, contributing to the project’s knowledge base.
Requirements:
- Currently pursuing a Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field.
- Strong foundation in database management systems (SQL, NoSQL, MangoDB, with a keen interest in vector databases and AI applications).
- Proficiency in programming languages such as Python, along with experience in data scraping and preprocessing techniques.
- Familiarity with machine learning models, particularly in the context of creating and managing vector embeddings (using LLM , OpenAI, miniLM).
- Optional: Some knowledge about Elastic search infrastructure.
- Excellent problem-solving skills and the ability to work independently on complex tasks.
- Strong communication skills, capable of collaborating effectively with a remote team.
Benefits:
- Valuable hands-on experience in a cutting-edge field at the intersection of AI, machine learning, and patent management.
- Internship and work experience certificate upon successful completion of the project.
- Competitive salary, negotiable based on skill level and performance.
- Opportunity for an endorsement certificate from PatentAssist.ai for outstanding contributions to the project.
Application Process:
Interested candidates should submit their resume on [email protected] along with a cover letter highlighting their experience and interest in vector, SQL and NoSQL databases and AI technologies. Please specify your availability for the 3 or 6-month internship duration.