HackathonParty

Framing the Problem

What is the problem you are trying to solve? Who does it affect?

HemaLink aims to solve the problem of low health literacy in the majority population.

For the average person, receiving blood tests results is an overwhelming collection of complex biomarkers and medical jargon. LDL cholestrol, HbA1c, and AST mean nothing to the average person. Receiving such detailed information about oneself and not being able to understand any of it is frustrating, but that's where HemaLink comes in. With HemaLink, users can translate their documents to see if there is anything they need to worry about. HemaLink is built not only for the average person, but for caregivers and health educators who need to track and communicate information effectively.

Idea Explanation

What is your idea? How does it fix the problem?

Our idea was to make an application that would utilize modern machine-learning models in order to solve the language barrier between complex medical information and the average person. HemaLink takes in complex and confusing data to take the weight of hours of research off the user's shoulders. HemaLink will give the user data-driven suggestions on how to improve their health by translating specific biomarkers into actionable lifestyle changes.

Implementation

How do all the pieces fit together? Does your frontend make requests to your backend? Where does your database fit in?

HemaLink's frontend is built using Next.js (React) for a fast and responsive user experience on both the web and mobile. It handles user authentication utilizing Clerk so that users can have an account to access their data in the future. The frontend handles file upload and sends API requests to the Python backend to upload and process the files for use.

The backend Python FastAPI server provides the high-performance endpoints. FastAPI provides ideal asynchronous capabilities that are perfect for handling the real-time OCR and ML tasks we need to have the application work. We tested different machine learning models using scikit-learn on openly-available datasets, making sure they were HIPAA complaint - not including any data that can be used to identify people. Used an open-source library to scan file uploads from user to convert into a JSON object that can be parsed to obtain blood test data and run through Random Forest Classifier models for each disease to predict likelihood of being prone to that disease.

Challenges

What did you struggle with? How did you overcome it?

One of our biggest struggles was data extraction and normalization. Before jumping into a project that handles data and machine learning, it is always important to communicate what data we need for different tasks, such as training and testing. When we discussed this, we left out extremely important specific details that caused us to backtrack a lot. For example, the lack of communication meant that while the model was being trained on certain metrics, the data collected was based on totally different metrics that the machine learning model would be unable to utilize. Apart from using some unfamiliar frameworks and for some of us on the team working with machine learning for the first time, communication was the factor that made this the most challenging.

Accomplishments

What did you learn? What did you accomplish?

We developed a full-stack application that took in a user file, interpreted it, and then communicated to the user the results. Some people on the team learned new frameworks and API's such as Next.js and Clerk Authentication for the first time. We learned how to get a frontend to communicate to a backend, and then get the backend to return its result to the frontend. This project helped everyone on the team gain a much better grasp on the concepts of full-stack development as a whole.

Next Steps

What are the next steps for your project? How can you improve it?

We want to keep adding more models to test different diseases, and maybe even expand the type of tests we take in and interpret. We also may want to implement more specific diagnosing, for example, hyperthyroid and hypothyroid diagnosing. We also are thinking of implementing AWS Textract to run in place of Tesseract so that we can further improve our cloud-computing skills.

HemaLink