Obavijesti

The summer term, for those who applied via mail, will be held on Tuesday, Jul 10 at 12:00 h (noon) in room D337 (if the department doors are locked wait in front of the main ZEMRIS entrance).

Autor: Mladen Karan

The TAR course is finished for this year. The very last thing that we will ask of you is to take a few minutes to fill out the final survey with any comments you may still have or want to emphasize (participation is encouraged but optional).

We all had a good run and we feel honored for having the opportunity to work with so many bright and motivated students. While this may be the end of TAR 2018, we hope it is only the beginning for many of you in the sense of further honing your NLP and paper-reading skills. We, by all means, encourage you to continue learning and growing, hopefully using some of the knowledge you picked up along the way. Good luck and have fun!

Autor: Mladen Karan

Your grades are now visible in Ferweb. If you want to decline your grade, please bring the corresponding form to the ZEMRIS secretary by Wednesday, Jul 4. at 2:00 p.m. The exact time and location of the summer term (primarily intended for those who missed the midterm) will be announced shortly (information on this that is currently visible in Ferweb will change). If you wish to take part in the summer term please let us know by e-mail no later than Wednesday, Jul 4 at 8:00 p.m.

Autor: Mladen Karan

Scores for the answers submitted via the Google forms are now visible in Ferweb. In addition, if you missed submitting some of the forms or did not attend some of the paper reading sessions, you should have received a mail with additional questions and instructions.

Autor: Mladen Karan

Due to multiple inquiries, we would like to remind you that for the camera-ready version of your paper it is obligatory to address only the "must-fix" comments of the reviewers. This should amount to minor typographical changes or strategically adding a sentence or two to resolve ambiguity or add missing information.

However, if you wish to fully address the reviewers' comments and make more substantial changes to your paper (which might include re-running the models or even revising your experimental setup), you are more than welcome to ask us for advice after TAR 2018 is officially concluded. Considering that we probably won't publish the proceedings before November, there will be plenty of time to make your work pitch perfect. 

Autor: Mladen Karan

Some teams put in a truly valliant amount of effort into their projects and have earned bonus points. The number of candidates for this was considerable, which made this a rather hard decision. Still, we have decided the following teams truly deserve additional recognition for their efforts.

  • Extra light year (+2 points) -- Tenzori (Antonio Šajatović, Tome Radman, and Lukrecija Puljić) for their paper "Self-attentive Similarity Based Approach for Community QA Ranking"
  • Extra mile (+1 point) -- Coolback-Leibler (Mihaela Bošnjak, Grgur Kovač, and Franko Šesto) for their paper "Personalized Medicine: Redefining Cancer Treatment Classification Using Bidirectional Recurrent Convolutions"
  • Extra mile (+1 point) -- The Mean Squares (Marin Drabić, Dora Marković, and Luka Suman) for their paper "Plain Text Enrichment Using Tweets and Emojis"

Congratulations, we tip our hats to you!

 

Autor: Mladen Karan

The reviewer discussions are finished and you should have gotten the final scores for your team via e-mail. Each team gets <number of team members> x <number of points> to distribute among its team members. You are free to distribute the points in any way you like, but you need to satisfy the constraint that no member gets more than 50 + <number of bonus points>. For example, a team of 3 which got 44.5 points and 1 bonus point can distribute 3 x 45.5 points, with each member getting at most 50 + 1 points. You need to send us the point distribution by Sunday, Jul 1 at 11:59 p. m.

Autor: Mladen Karan

The reviews are finished, you should have gotten them by e-mail with which you registered to EasyChair. Each paper received two reviews. You may consider the per-category scores that you have got (averaged over the two reviewers) as a good approximation of your final project score. However, to make the process even fairer, the reviewers will convene together in the following days, discuss their reviews, and possibly make slight tweaks to your scores. You can expect the scores to be finalized by the end of the week.

As reviewers are only human, it is possible for them to make mistakes. To account for this we are offering you a chance of a rebuttal. If you feel very strongly that the reviewers misunderstood or missed something that you have stated in the paper, you can send us an email with a rebuttal. We expect you to do this if and only if you have strong arguments based on what is already contained in the paper. If you decide to submit a rebuttal, the deadline for it is this Wednesday at 1 p.m.

You can start working on the final ("camera-ready") versions of your papers by addressing the "must fix" remarks of the reviewers. It shouldn't take you more than an hour or two, depending on the review. The deadline for this is Monday Jul 2, at midnight. Simply upload the new version over the old one in EasyChair.

Autor: Mladen Karan

Due to many other parallel obligations, we've (rather conveniently) decided to extend our deadline for giving you reviews for your papers from June 22 to June 25 (end of the day). To compensate for this, we are also extending your deadline for implementing the "must have" review comments in your paper from June 29 to July 2 (end of the day).

Autor: Mladen Karan

The schedule for the project presentations is finished. Keep in mind that we decided to add a third slot on Friday morning, which is still to appear in your calendars (but we really couldn't do without it :( ). It is sufficient that a single team member presents the project results (but other members are encouraged to participate as well). Additionally, all participants should attend all other presentations from all slots (except in cases of conflicts with other subjects etc.). If none of the team members is able to present your work in the assigned slot, please let us know as soon as possible.

You should prepare:

  •  a short presentation of your work and results;
  •  a short demo showing your system in action (it can be very simple as long as it demonstrates the
    functionality of your system).

Please give your presentation (including demo) in under 10 minutes, which will be followed by a 5-minute Q&A session.

You can find the current schedule (subject to minor changes) in the extended post.

Autor: Mladen Karan

There have been 24 successful (and 0 unsuccessful) submissions. If you did not receive an e-mail from us regarding your project, that means that everything went smoothly with your submission. The next steps are the project presentations, which will be held on Wednesday, June 13 and Thursday, June 14. More information on this, as well as the presentation schedule, is coming soon!

Autor: Mladen Karan

The topic of the fifth paper reading session is authorship analysis. You are required to read the following paper and submit your reviews:

Flekova, L., & Gurevych, I. (2015). Personality profiling of fictional characters using sense-level links between lexical resources. EMNLP 2015

Paper discussion will be held on Friday, Jun 15 during the class. Should you miss it, you'll need to hand in the answers for an additional set of questions we'll publish after the class.

Make sure to fill the form by Thursday, Jun 14 at midnight.

Autor: Mladen Karan

In the interest of getting your projects to the best possible quality (all the while preserving your sanity), we've decided to give you some extra time! The deadline for the project submission has been extended to Sunday Jun 10, 11:59 p.m. CET. The extra office hours slot tomorrow morning (in room D339-a) is still available, should you need it.

Good luck!

Autor: Mladen Karan

When submitting your project you should prepare the following.

  1. Your project report in PDF format
  2. Tex source files required to build the pdf of your project report (if you used packages other than those used in the template you don't have to include them, also omit intermediate latex compilation files such as *.aux, *.log,  etc.)
  3. source code of your project (you needn't include packages, data sets or binaries, the textual source code files or notebooks are sufficient)

You should submit 1. to the Easychair conference management system at this address (you will likely have to create an account). If you wish, the system does allow you to revise your pdf until the deadline.  After submitting the pdf, you should also send 2. and 3. via e-mail to mladen.karan@fer.hr (as a zip file, dropbox/github/bitbucket link or in any other reasonable way you prefer). It is enough to complete these steps once for each team.

The deadline for this is Friday, June 8, 11:59 p.m. CET.

Autor: Mladen Karan

The deadline for submitting your projects is Friday, June 8, at midnight. Instructions with more details concerning the submission will be posted next week.

Please pay attention to the relevant parts of the Project Vademecum. Particularly the fact that, even though you will have to submit your code alongside your paper, your grade is derived exclusively from your paper and presentation. So it's very important that your paper is well written. We advise you to test the clarity/quality of your paper by acquiring an internal review from a member of some other project team (ideally a team that does not have the same project topic as you), who can point out anything you should fix.

If you have any urgent last-minute questions or problems about your projects, you can contact us by e-mail at mladen.karan@fer.hr. Moreover, we have arranged an additional office-hours slot for last-minute questions about your papers on Friday, June 8 from 9 to 10 a.m.  

Good luck with the final touches to your projects!

Autor: Mladen Karan

The topics of the fourth paper reading session are semantic similarity and document summarization. You are required to read the following two papers and submit your reviews:

Paper discussion will be held on Friday, Jun 8 during the class. Should you miss it, you'll need to hand in the answers for an additional set of questions we'll publish after the class.

Make sure to fill the form by Thursday, Jun 7 at midnight.

Take it easy and enjoy the papers!

Autor: Jan Šnajder

For the third paper reading session, we'll cover the topic of sentiment analysis. You are required to read one paper and write a report.

Please pick one paper from the list:

Paper discussion will be held on Wednesday, May 30 during the class. Should you miss it, you'll need to hand in the answers for an additional set of questions we'll publish after the class.

Make sure to fill the form by Tuesday, May 29 at midnight.

Relax and enjoy the papers!

Autor: Mladen Karan

The third paper reading session, initially scheduled for Friday, June 1, has been moved to Wednesday, May 30  from 12 to 2 p.m. in lecture rom D-273. The deadline for submitting the reviews for this session will be Tuesday, May 29 at midnight.

 

Autor: Mladen Karan

Since the sessions and reports are a substitute for the final exam, we take them quite seriously. Even if you skip a reading session (which you are allowed to do only once), you still have to submit the reviews via Google forms on time, just like everybody else. Please bear that in mind for the upcoming and all future paper reading sessions.

Autor: Jan Šnajder

The second reading session covers question answering. You are required to read one paper and write a report for it. You can find more information about the papers and links to the corresponding forms below.

Please pick one paper from the list:

The report forms are online, so you should start reading right away! We'll discuss the papers on next Friday, May 25 during the class. Should you miss it, you'll need to hand in the answers for an additional set of questions we'll publish after the class.

Make sure to fill out the form by Thursday, May 24 at 7 p. m. 

Enjoy! :)

Autor: Mladen Karan
Results of the midterm exam

The results of the midterm exam are visible in FerWeb. You can come and take a look at your exams on Wednesday, May 16 at 2 p.m. in room D-305.

Autor: Mladen Karan

Next week, on Tuesday and Wednesday, May 8 and 9, we'll organize the project checkpoint. It will be held in room D-305 (inside the ZEMRIS department). If the doors are locked, please wait at the department entrance. Note that it is not mandatory for all team members to be present at the checkpoint.

You can find the time slots below.

EDIT: We have tweaked the schedule below to minimize unnecessary waiting, be sure you don't miss the changes.

Autor: Mladen Karan

The project checkpoint sessions will be held on Tuesday and Wednesday, May 8 and 9. Attending the project checkpoint session is mandatory for all teams (though it is not mandatory for all team members to be present). You will be asked to deliver an informal presentation of what you've done so far, for us to ensure that you're going in the right direction. Moreover based on your presentation, we'll get an initial impression of your efforts, which will affect your final project score. We're asking you to take this very seriously and come prepared.

For those who are still unsure about what to present, we refer to the project vademecum, where it says:

The project checkpoint stands as a final check before the final project submissions. For the checkpoint, you are required to already have a certain portion of work done: you should at least have a working baseline system, evaluated using the official evaluation metrics.

The exact time slots for each team will be announced next week.

Autor: Mladen Karan

The midterm exam will be held on Monday, April 23 at 11:30 a.m. in the lecture room D-1. It might be the case that the exam slot is not visible in your FerWeb calendars, so pay extra attention. The duration of the exam is 90 minutes. Besides general stationery, you are allowed to bring a calculator as well.

The midterm exam covers topics 1--7 (consult the course syllabus). It comprises solely of short open response questions in which you demonstrate understanding of the material. Examples of such questions would be "Explain advantages of approach X compared to approach Y." or "What would happen in method X if we skip step Y." In some of the questions, you might be asked to perform minor calculations (similar to those required for some of the multiple choice questions or shorter problem questions in the previous years' exams). You can find the old midterm exams in the repository, but keep in mind that they are not fully representative of the exam format for this year. To make it easier for you to prepare for the exam, consult the learning outcomes.

Autor: Mladen Karan

The topic of the next lecture is Text Classification, Clustering, And Latent Models. To prepare for the class, please find time to read the following:

  • IIR Chapter 13: Text classification & Naive Bayes
    • Section 13.3
  • IIR Chapter 18: Matrix decompositions and latent semantic indexing
    • Section 18.4
Autor: Mladen Karan

Keep in mind that the paper reading sessions in the second part of the semester fill the role of a standard final exam. Consequently, attending paper reading sessions is a requirement for passing the course. You may miss at most one session, but will then be asked to hand in answers to additional questions about the paper you've read. Please refer to slide 72 in the introductory lecture for more details.

Autor: Mladen Karan

The topic of the next lecture is Improved Search, Evaluation, and Web Search. To prepare for the class, please find time to read the following:

  • IIR Chapter 9: Relevance feedback and query expansion
    • Sections 9, 9.1, 9.1.3, 9.1.5, 9.1.6, and 9.1.7
  • IIR Chapter 21: Link Analysis
    • ​​​​​​​​​​​​​​​​​​​​​Sections 21, 21.1, 21.2, and the beginning of 21.3
Autor: Mladen Karan

We've finished assigning the project topics to teams. The resulting assignment is fairly good: only four teams were not assigned their most preferred topic (but did get their second choice). Please find what topic you've gotten in the extended post.

Having the project topic assigned to your team, you can start working on your projects! The next step is the checkpoint on May 7-9, where we will check on your progress. If you are unsure about how to start and think you need additional guidelines, feel free to visit us during the office hours, that's what we're here for. 

 

Autor: Mladen Karan

In case you haven't been able to find a project team, please send an e-mail to mladen.karan@fer.hr by 11:59 p.m. today (you may optionally include topic preferences).

Autor: Mladen Karan

Answering TAR-related questions over e-mail is extremely tedious: it's often ambiguous, it lacks helpful visualizations, and, most importantly, it takes a lot of time from both sides. To make our lives easier, please come to the office hours each Friday at 2 p.m. To apply, fill out this form. Please include what bothers you in the form since, being only human, it's impossible for us to know everything about every TAR topic in advance. We'll make sure to prepare for your questions beforehand.

You must apply for the office hours on Wednesday at 11:59 p.m. at the latest. We'll then contact you with information about the location. 

Autor: Mladen Karan

The topic of the next lecture is Basics of Information Retrieval. To prepare for the class, please find time to read the following:

  • IIR Chapter 1: Boolean retrieval
    • Sections 1.1
  • IIR Chapter 6: Scoring, term weighting & the vector space model
    • Sections 6.2, 6.2.1, 6.2.2, 6.3, and 6.3.1
  • IIR Chapter 12: Language models for information retrieval​​​​​​​
    • ​​​​​​​​​​​​​​​​​​​​​Sections 12.1.2, 12.1.3, and 12.2.1
Autor: Mladen Karan

For more details on the project topics, please visit links provided alongside topic descriptions (especially competition websites, where available). In case of any questions, please let us know.

You can bid for the projects by filling out this form. Only one member of the team is required to send the team's information and preferences. Bidding will be open until Wednesday, March 21 at 11:59 p.m. Each team must consist of three members (or two if you couldn't find a third member or think you can handle the given task with fewer members). After the deadline, all students without a team will be assigned one by the TAs.

We will do our best to optimize your choices and assign to each team their highest pick possible (time of submission is not a factor in the optimization). You can expect the topics to be assigned on Friday, March 23.

Autor: Mladen Karan

You can find the topics for this year's TAR project at this link and a project vademecum (beacuse why not :) at this link. You should team up in teams of 2-3 (preferably 3) people and figure out which topics interest you the most. Further instructions on how to sign up your team and define your topic preferences via a Google form will be posted shortly.

 

Autor: Mladen Karan

The topic of the next lecture is Basics of Natural Language Processing. To prepare for the class, please find time to read the following:

  • FSNLP Chapter 1: Introduction
    • Sections 1.1, 1.3, 1.4.1, and 1.4.2
  • FSNLP Chapter 3: Linguistic Essentials
    • Sections 3.1, 3.2, and 3.3

Note: Sections do not include subsections, e.g., you are required to read Section 3.1, but not 3.1.1. Of course, feel free to read them as well if you want to!

Autor: Mladen Karan

Text Analysis and Retrieval course starts on Friday, March 9 at 11:15 a.m. in lecture room D-273. Welcome aboard!

Autor: Mladen Karan
TAR 2018: Important information for...

We're excited to announce the fifth edition of the Text Analysis and Retrieval (TAR) course. If you're interested in search engines, text analysis, statistical natural language processing, and the application of machine learning to natural language processing, then this course is for you. However, since TAR might be a bit different than the courses you took in the past, we ask you to take into account the following information before you make your final decision about enrolling the course.

  1. TAR is taught in English only (level L3). All course material, including exams, will be in English. There is no "Croatian group" and no material in Croatian. If you enroll this course, we assume that you're accepting these terms and that you have a good-enough command of English to follow the lectures and participate in class.
  2. In the second half of the course, the classes will revolve exclusively around paper reading sessions. What this means is that you will be asked to read scientific papers (which are in English) published at recent and renowned conferences, summarize the papers, answer key questions about them, and finally participate in discussions, which we will have together in class. There will be no way around this: reading sessions are an integral part of the course, we're doing them for you and you only, and you can't make up for them by doing something else. Furthermore, you will have to attend all paper discussions, with at most one absence, which you will have to compensate for by answering additional paper-related questions. Why are we doing all this? Because we firmly believe that being able to read, review, and discuss scientific papers is a tremendously important skill, regardless whether you intend to pursue an academic career. We're also doing it because we found out it's a much better and a more amusing way to engage with the topics we want to cover. And while most students felt that reading and discussing papers was indeed a lot of fun, it is certainly not for everybody.
  3. The central activity of the course is the project work. Project are done primarily in the second half of the course and revolve around a practical and trendy information retrieval or natural language processing task. You get to choose a topic from a list of topics. Three points deserve a mention here. First, the projects are done in 2-3 person teams. There is no way around this; you can't do the project on your own, and you have to team up by yourself. Second, the project results will need to be wrapped up in a form of a short scientific paper. You can write the paper in English or Croatian. Third, there will be a short (5 min) project presentation at the end of the course. This again can be done in English or Croatian, and it suffices that one project member presents your work.
  4. Though machine learning is not a formal prerequisite for TAR, taking the course without knowing the ML basics will probably cause much frustration. Here we don't necessarily mean that you should have completed FER's ML course: any other course or self-study that provided you with the basics of ML will be fine. On the other hand, if you absolutely had no prior exposure to ML, we don't advise enrolling TAR.
Autor: Jan Šnajder