Paul Sanderson - Deciphering Big Data Reflection

Deciphering Big Data is the first module for my Master of Science. I didn't know what to expect, and I was pleasantly surprised by how well-structured the learning activities are. There is a very good blend of different types of activities, which build on each other.

What?

The collaborative discussions work well with the initial post, peer feedback and summary post. I hadn't used references for a very long time, so it was challenging at first to use the correct format, but I am now much more confident with using Harvard referencing.

I love using HTML, so I really enjoyed building up my e-portfolio using a GitHub website. I tried to keep up with the activities as the module progressed, so there wasn't too much to add at the very end of the module.

The seminars are a great way to connect with the tutor and other students. The tutor always encouraged the students to contribute, with questions everyone could answer, and there was time at the end of most seminars for general questions and answers. I tried to always contribute relevant questions and comments to these sessions.

The main project builds progressively, starting with a team project to give support and feedback, flowing into an individual project which has a higher grade weighting. I was very fortunate to be able to make a team with two other students who regularly attended the seminars, and they were very committed and easy to work with. I am pretty good at encouraging others, and I used this well in our discussions and online video calls - we had a very good team spirit. I was proud to be able to bring my technical expertise to the group, although on reflection I realise I skewed the team project too far towards the use of Python and SQL.

I personally found the core reading the most useful activity, particularly the textbooks Python for Data Analysis by McKinney and Data Wrangling with Python by Kazil and Jarmul. These two books were very clearly written and helpful - I'd like to read them again! The wiki activities usually involved applying what we learned in the textbooks, and I really enjoyed testing out my Python skills on practical examples.

I was worried about the time commitment of the study, but I've been a full-time teacher for a very long time, and I was able to apply my time management skills from teaching to my study. Every week I carefully looked at the required tasks, and spread them across my available time for that week. I was pleased that when I got to the end of the module I didn't have an overwhelming amount of work to complete. This was particularly true with the e-portfolio, as I had made a page for each e-portfolio task as the tasks were given.

So what?

I have worked with data many times in my career. During the pandemic I helped a large school manage its online assessment and exam results, so I have some practical experience with the challenges of big data. The readings really helped me gain a broader and deeper understanding of different kinds of big data and other challenges I wasn't aware of. The readings also made me much more aware of the commercial aspects of working with big data - managing big data can be very expensive in terms of time and software/hardware resources, so efficient techniques are vital for adding "business value". I also became much more aware of GDPR and similar regulations for using data ethically - this will be vitally important to any work I do with data in the future.

I had previously only used Python a little, so I was looking forward to learning more. I made time to try a lot of the examples in the Python textbooks, and I now feel much more confident using Python to manipulate and analyse data. I learned to use Python to "scrape" data from a web page, convert unstructured data to "JSON" format, examine semi-structured data and prepare it for a database, and access an "API". I was able to show evidence of this in my e-portfolio, particularly in the web scraping assignment, the database design project, and the API security task.

I think my main area of learning came from the team project. We were very pleased with our final product, but in his feedback our tutor said we didn't fully meet the requirements. I felt disappointed and upset at first, but now I see that our report was too technical (it included Python and SQL code) and we should have included a critical discussion of the available options, rather than just describing in technical detail the option we chose. On reflection I think we misunderstood the assignment - we were explaining our solution to the tutor, rather than recommending our solution to the imaginary client.

Now what?

I learned that it is vitally important to carefully read the assignment instructions and not just assume I understand what is required. In my future studies, I plan to spend more time on fully understanding the requirements of an assignment, rather than rushing to get started.

In my future career, I am now more ready to work with different kinds of big data, and I'll be more aware of subtle challenges and how to manage those challenges, especially with Python tools.

Thank you, Dr. Ali Zalzala, for guiding us through this module and helping us become adept at deciphering big data!