Guidance for a beginner to extract data from a website

  • Thread starter Mr.Husky
  • Start date
In summary, the individual is a high school graduate who is planning to major in computer science and engineering. They have set a goal for themselves to find the name of a person with a known ranking in a competitive exam. They are seeking guidance from someone with knowledge in computer science to determine if this goal is achievable and what skills they would need to possess. The individual has attempted to find the name by trying random names on the exam website, but this has not been successful. They are now considering using a public API or scraping the data from the website to find the name. However, they are also interested in understanding what they can and cannot do in computer science, as hacking is not typically considered a main component of the field.
  • #1
Mr.Husky
Gold Member
89
28
TL;DR Summary
Guidance to extract data from a website for a beginner.
Hello!
I just completed my high school and about to major in computer science and engineering. I thought it will be better if I create a goal to keep myself interested on the field. It is simple, concrete and I think it is doable. And I need someone to guide me cause I know nothing about CS.

My goal is to find name of a person whose "rank" in a competitive exam is known. That's it. Let me expand on it. Recently, the exam conducting body released results based on "names". That means you don't have to enter any other details or verify yourself to see your or any other's result. You can know it just by knowing full name. And they provide some data related to self's rank. Now, I got 9307 rank in this exam. And the data mentioned, " no. Of students with same rank, boys- 0 and girls- 1". My goal is to find who got that rank. If you know the name, you just enter it and see the rank. If I know the rank, can I conversely find the name? Is it possible? Well I know nothing about web applications. Do you think is it doable? If so, how to approach it? What skills do I need to know? If you know how to do this task, please don't mention the process. But guide me so that I can do it myself. I recently opened a book, it said, type print("hello world!") In python.and boom I got the same words down the line. Then i stopped learning programming. I didn't found it any exciting. Maybe this task may teach me something.

Thank you!
Ganesh kumara.
 
Technology news on Phys.org
  • #2
The most effective ways of doing what you want are to either :

a) ask the marking body who belongs to that ranking, or

b) inquire amongst the students being ranked to see who matches.

However, bear in mind that the phrase "no. of students with same rank" may not mean "no. of other students with same rank", ie 1(one) person is in that rank : presumably you.
 
  • #3
So you have no idea what the names are of the others who took the exam, and want to keep trying random names until you find a score equal to your 9307?

If the website is designed well, it will lock you out after you have tried 3-4 random names with no match to the database.
 
  • Like
Likes Vanadium 50
  • #4
berkeman said:
So you have no idea what the names are of the others who took the exam, and want to keep trying random names until you find a score equal to your 9307?

If the website is designed well, it will lock you out after you have tried 3-4 random names with no match to the database.
Well that's not the case sir. I don't know whether it is ethical or not but I checked results of more than 30 people since I know their names. ( Some are from exam hall, some from my college).

The problem is trying random names doesn't work because total number of students participated is 137,000+ .
 
  • #5
hmmm27 said:
The most effective ways of doing what you want are to either :

a) ask the marking body who belongs to that ranking, or

b) inquire amongst the students being ranked to see who matches.

However, bear in mind that the phrase "no. of students with same rank" may not mean "no. of other students with same rank", ie 1(one) person is in that rank : presumably you.
For option b, the total student count participated is 137,000+.

Thanks I rechecked the analytics they provided and it said, "No. of Girls (Equal your Rank)=1" since I am a boy, there must be a girl with the same rank. But my interest is not in figuring out who is that but to understand what I can do in computer science and what I can't do. I just got this idea and want to know is it possible to conversely find the data from a website?
 
  • #6
Mr.Husky said:
is it possible to conversely find the data from a website?
If you're lucky and the website has a public API, you can just use that.

Most websites don't, though, so your only option other than manually browsing is to scrape the data--write a program to automatically download web pages and extract data from the html. My usual go-to in Python for doing that is BeautifulSoup.
 
  • Informative
Likes Mr.Husky
  • #7
Mr.Husky said:
But my interest is not in figuring out who is that but to understand what I can do in computer science and what I can't do.
Hacking is generally not considered a main component of computer sciences.
 
  • Like
Likes berkeman, Vanadium 50 and Mr.Husky
  • #8
PeterDonis said:
If you're lucky and the website has a public API, you can just use that.

Most websites don't, though, so your only option other than manually browsing is to scrape the data--write a program to automatically download web pages and extract data from the html. My usual go-to in Python for doing that is BeautifulSoup.
So I have to learn python now. Thanks for mentioning BeautifulSoup. Just got to know about it. So I will just learn how to code in python and maybe after a few months, I will get to know who got the same rank.
 
  • Skeptical
Likes berkeman
  • #9
This is really creepy.

Thread closed at least temporarily for moderator discussion
 
  • Like
Likes Vanadium 50 and berkeman

1. What is data extraction and why is it important?

Data extraction refers to the process of collecting data from various sources, such as websites, and organizing it in a structured format for analysis. It is important because it allows researchers and businesses to gather relevant and accurate information to make informed decisions and gain valuable insights.

2. How do I extract data from a website as a beginner?

As a beginner, the easiest way to extract data from a website is by using a web scraping tool or programming language such as Python. These tools allow you to specify the website you want to scrape and the specific data you want to extract, and then save it in a structured format for further analysis.

3. What are some common challenges I may face when extracting data from a website?

Some common challenges when extracting data from a website include dealing with dynamic content, navigating through anti-scraping measures put in place by the website, and managing large amounts of data. It is important to have a basic understanding of HTML and CSS to overcome these challenges.

4. Are there any legal considerations when extracting data from a website?

Yes, there are legal considerations to keep in mind when extracting data from a website. Make sure to read the website's terms and conditions and follow any guidelines or restrictions they have in place. Additionally, be mindful of any copyright laws and ensure that you are not infringing on any intellectual property rights.

5. How can I ensure the accuracy and reliability of the data I extract from a website?

To ensure the accuracy and reliability of the data, it is important to verify the source of the data and cross-check it with other sources. Additionally, regularly check and update your scraping code to ensure it is capturing the most recent and relevant data from the website.

Similar threads

  • Programming and Computer Science
Replies
6
Views
1K
Replies
11
Views
1K
Replies
17
Views
977
  • Programming and Computer Science
Replies
8
Views
935
  • Programming and Computer Science
Replies
1
Views
592
  • Quantum Interpretations and Foundations
Replies
7
Views
532
  • Programming and Computer Science
Replies
7
Views
1K
  • Programming and Computer Science
Replies
19
Views
1K
  • Programming and Computer Science
Replies
6
Views
3K
  • Programming and Computer Science
Replies
2
Views
1K
Back
Top