Obavijesti

For the third paper reading session, we'll cover the topic of sentiment analysis. You are required to read one paper and write a report.

Please pick one paper from the list:

Paper discussion will be held on Wednesday, May 30 during the class. Should you miss it, you'll need to hand in the answers for an additional set of questions we'll publish after the class.

Make sure to fill the form by Tuesday, May 29 at midnight.

Relax and enjoy the papers!

Autor: Mladen Karan

The third paper reading session, initially scheduled for Friday, June 1, has been moved to Wednesday, May 30  from 12 to 2 p.m. in lecture rom D-273. The deadline for submitting the reviews for this session will be Tuesday, May 29 at midnight.

 

Autor: Mladen Karan

Since the sessions and reports are a substitute for the final exam, we take them quite seriously. Even if you skip a reading session (which you are allowed to do only once), you still have to submit the reviews via Google forms on time, just like everybody else. Please bear that in mind for the upcoming and all future paper reading sessions.

Autor: Jan Šnajder

The second reading session covers question answering. You are required to read one paper and write a report for it. You can find more information about the papers and links to the corresponding forms below.

Please pick one paper from the list:

The report forms are online, so you should start reading right away! We'll discuss the papers on next Friday, May 25 during the class. Should you miss it, you'll need to hand in the answers for an additional set of questions we'll publish after the class.

Make sure to fill out the form by Thursday, May 24 at 7 p. m. 

Enjoy! :)

Autor: Mladen Karan

As we outlined at the beginning of this course, we want you to earn course points by reading scientific papers. We prefer this over traditional exams because we strongly think that traditional exams are not suitable for the courses like TAR. Even though this is (probably) radically different from everything you've done in other courses so far, we don't have a single doubt in our minds that you'll find this interesting and useful, regardless if you plan on pursuing a career in academia or industry. 

For the first "reading session", we'll cover the tasks of information extraction. You are required to read two papers and write a report for each of them. The report comprises a short summary, a set of comprehension questions you need to answer, comments about both strengths and weaknesses of the approach explained in the paper, and a single (or more) question for us about the paper. Additionally, we want to know how much time you've spent doing this.

You can find more information below.

Please pick two papers from the list:

We'll discuss the papers on next Friday, May 18 during the class. Should you miss it, you'll need to hand in the answers for an additional set of questions we'll publish after the class. Keep in mind that you may miss at most one paper reading session in this way.

As mentioned above, we want to know how much time is required for a single paper reading session. Therefore, we kindly ask you to time-track. We recommend using Toggl, a time-tracking application available for every popular platform (including browsers). Note that this is mandatory, as you will have to report the time spent in the report.

Happy reading! :)

EDIT: The deadline for submitting your answers via the forms is until the next lecture, i.e. Friday, May 18 at 11 a.m.

Autor: Mladen Karan
Results of the midterm exam

The results of the midterm exam are visible in FerWeb. You can come and take a look at your exams on Wednesday, May 16 at 2 p.m. in room D-305.

Autor: Mladen Karan

Next week, on Tuesday and Wednesday, May 8 and 9, we'll organize the project checkpoint. It will be held in room D-305 (inside the ZEMRIS department). If the doors are locked, please wait at the department entrance. Note that it is not mandatory for all team members to be present at the checkpoint.

You can find the time slots below.

EDIT: We have tweaked the schedule below to minimize unnecessary waiting, be sure you don't miss the changes.

Autor: Mladen Karan

The project checkpoint sessions will be held on Tuesday and Wednesday, May 8 and 9. Attending the project checkpoint session is mandatory for all teams (though it is not mandatory for all team members to be present). You will be asked to deliver an informal presentation of what you've done so far, for us to ensure that you're going in the right direction. Moreover based on your presentation, we'll get an initial impression of your efforts, which will affect your final project score. We're asking you to take this very seriously and come prepared.

For those who are still unsure about what to present, we refer to the project vademecum, where it says:

The project checkpoint stands as a final check before the final project submissions. For the checkpoint, you are required to already have a certain portion of work done: you should at least have a working baseline system, evaluated using the official evaluation metrics.

The exact time slots for each team will be announced next week.

Autor: Mladen Karan

The midterm exam will be held on Monday, April 23 at 11:30 a.m. in the lecture room D-1. It might be the case that the exam slot is not visible in your FerWeb calendars, so pay extra attention. The duration of the exam is 90 minutes. Besides general stationery, you are allowed to bring a calculator as well.

The midterm exam covers topics 1--7 (consult the course syllabus). It comprises solely of short open response questions in which you demonstrate understanding of the material. Examples of such questions would be "Explain advantages of approach X compared to approach Y." or "What would happen in method X if we skip step Y." In some of the questions, you might be asked to perform minor calculations (similar to those required for some of the multiple choice questions or shorter problem questions in the previous years' exams). You can find the old midterm exams in the repository, but keep in mind that they are not fully representative of the exam format for this year. To make it easier for you to prepare for the exam, consult the learning outcomes.

Autor: Mladen Karan

The topic of the next lecture is Text Classification, Clustering, And Latent Models. To prepare for the class, please find time to read the following:

  • IIR Chapter 13: Text classification & Naive Bayes
    • Section 13.3
  • IIR Chapter 18: Matrix decompositions and latent semantic indexing
    • Section 18.4
Autor: Mladen Karan

Keep in mind that the paper reading sessions in the second part of the semester fill the role of a standard final exam. Consequently, attending paper reading sessions is a requirement for passing the course. You may miss at most one session, but will then be asked to hand in answers to additional questions about the paper you've read. Please refer to slide 72 in the introductory lecture for more details.

Autor: Mladen Karan

The topic of the next lecture is Improved Search, Evaluation, and Web Search. To prepare for the class, please find time to read the following:

  • IIR Chapter 9: Relevance feedback and query expansion
    • Sections 9, 9.1, 9.1.3, 9.1.5, 9.1.6, and 9.1.7
  • IIR Chapter 21: Link Analysis
    • ​​​​​​​​​​​​​​​​​​​​​Sections 21, 21.1, 21.2, and the beginning of 21.3
Autor: Mladen Karan

We've finished assigning the project topics to teams. The resulting assignment is fairly good: only four teams were not assigned their most preferred topic (but did get their second choice). Please find what topic you've gotten in the extended post.

Having the project topic assigned to your team, you can start working on your projects! The next step is the checkpoint on May 7-9, where we will check on your progress. If you are unsure about how to start and think you need additional guidelines, feel free to visit us during the office hours, that's what we're here for. 

 

Autor: Mladen Karan

In case you haven't been able to find a project team, please send an e-mail to mladen.karan@fer.hr by 11:59 p.m. today (you may optionally include topic preferences).

Autor: Mladen Karan

Answering TAR-related questions over e-mail is extremely tedious: it's often ambiguous, it lacks helpful visualizations, and, most importantly, it takes a lot of time from both sides. To make our lives easier, please come to the office hours each Friday at 2 p.m. To apply, fill out this form. Please include what bothers you in the form since, being only human, it's impossible for us to know everything about every TAR topic in advance. We'll make sure to prepare for your questions beforehand.

You must apply for the office hours on Wednesday at 11:59 p.m. at the latest. We'll then contact you with information about the location. 

Autor: Mladen Karan

The topic of the next lecture is Basics of Information Retrieval. To prepare for the class, please find time to read the following:

  • IIR Chapter 1: Boolean retrieval
    • Sections 1.1
  • IIR Chapter 6: Scoring, term weighting & the vector space model
    • Sections 6.2, 6.2.1, 6.2.2, 6.3, and 6.3.1
  • IIR Chapter 12: Language models for information retrieval​​​​​​​
    • ​​​​​​​​​​​​​​​​​​​​​Sections 12.1.2, 12.1.3, and 12.2.1
Autor: Mladen Karan

For more details on the project topics, please visit links provided alongside topic descriptions (especially competition websites, where available). In case of any questions, please let us know.

You can bid for the projects by filling out this form. Only one member of the team is required to send the team's information and preferences. Bidding will be open until Wednesday, March 21 at 11:59 p.m. Each team must consist of three members (or two if you couldn't find a third member or think you can handle the given task with fewer members). After the deadline, all students without a team will be assigned one by the TAs.

We will do our best to optimize your choices and assign to each team their highest pick possible (time of submission is not a factor in the optimization). You can expect the topics to be assigned on Friday, March 23.

Autor: Mladen Karan

You can find the topics for this year's TAR project at this link and a project vademecum (beacuse why not :) at this link. You should team up in teams of 2-3 (preferably 3) people and figure out which topics interest you the most. Further instructions on how to sign up your team and define your topic preferences via a Google form will be posted shortly.

 

Autor: Mladen Karan

The topic of the next lecture is Basics of Natural Language Processing. To prepare for the class, please find time to read the following:

  • FSNLP Chapter 1: Introduction
    • Sections 1.1, 1.3, 1.4.1, and 1.4.2
  • FSNLP Chapter 3: Linguistic Essentials
    • Sections 3.1, 3.2, and 3.3

Note: Sections do not include subsections, e.g., you are required to read Section 3.1, but not 3.1.1. Of course, feel free to read them as well if you want to!

Autor: Mladen Karan

Text Analysis and Retrieval course starts on Friday, March 9 at 11:15 a.m. in lecture room D-273. Welcome aboard!

Autor: Mladen Karan
TAR 2018: Important information for...

We're excited to announce the fifth edition of the Text Analysis and Retrieval (TAR) course. If you're interested in search engines, text analysis, statistical natural language processing, and the application of machine learning to natural language processing, then this course is for you. However, since TAR might be a bit different than the courses you took in the past, we ask you to take into account the following information before you make your final decision about enrolling the course.

  1. TAR is taught in English only (level L3). All course material, including exams, will be in English. There is no "Croatian group" and no material in Croatian. If you enroll this course, we assume that you're accepting these terms and that you have a good-enough command of English to follow the lectures and participate in class.
  2. In the second half of the course, the classes will revolve exclusively around paper reading sessions. What this means is that you will be asked to read scientific papers (which are in English) published at recent and renowned conferences, summarize the papers, answer key questions about them, and finally participate in discussions, which we will have together in class. There will be no way around this: reading sessions are an integral part of the course, we're doing them for you and you only, and you can't make up for them by doing something else. Furthermore, you will have to attend all paper discussions, with at most one absence, which you will have to compensate for by answering additional paper-related questions. Why are we doing all this? Because we firmly believe that being able to read, review, and discuss scientific papers is a tremendously important skill, regardless whether you intend to pursue an academic career. We're also doing it because we found out it's a much better and a more amusing way to engage with the topics we want to cover. And while most students felt that reading and discussing papers was indeed a lot of fun, it is certainly not for everybody.
  3. The central activity of the course is the project work. Project are done primarily in the second half of the course and revolve around a practical and trendy information retrieval or natural language processing task. You get to choose a topic from a list of topics. Three points deserve a mention here. First, the projects are done in 2-3 person teams. There is no way around this; you can't do the project on your own, and you have to team up by yourself. Second, the project results will need to be wrapped up in a form of a short scientific paper. You can write the paper in English or Croatian. Third, there will be a short (5 min) project presentation at the end of the course. This again can be done in English or Croatian, and it suffices that one project member presents your work.
  4. Though machine learning is not a formal prerequisite for TAR, taking the course without knowing the ML basics will probably cause much frustration. Here we don't necessarily mean that you should have completed FER's ML course: any other course or self-study that provided you with the basics of ML will be fine. On the other hand, if you absolutely had no prior exposure to ML, we don't advise enrolling TAR.
Autor: Jan Šnajder